This notebook series was updated from the previous one sds-2-x-dl. The notebook series was updated on 2022-01-17. See changes from previous version in table below as well as current flaws that needs revision.
Thanks to ...
Oskar testing github actions...
Table of changes and current flaws
| Notebook | Changes |
|---|---|
| 031 | cmd02: updated instructions |
| 031 | cmd09: added contains in predicates |
| 031 | cmd13, cm16: deleted as it was the same as cmd12, cm14 |
| 031 | cmd21: added markdown mentioning the run time for the two methods |
| 031 | cmd64: added markdown with shapefile info |
| 031 | cmd71: deleted, redundant |
| 031a | cmd11: deleted, it was commented and not neccessary |
| 031a | cmd13: added comments |
| 031a | cmd20: added comments |
| 031a | cmd24: deleted redundant comments |
| 031a | cmd25: deleted redundant comments |
| 031a | cmd28-cmd36: added exercise solution (tiny area center of Stockholm) |
| 032 | cmd24-28: downloaded the 2017 data and created the schema. The schema uses pick-up and drop-off id instead of coordinates, so couldn't procees with magellan's points . |
| 032a | cmd9 added markdown for disk usage |
| 032a | deteleted previous commands were cmd35-36 as they respective notebook is missing |
| 032d | cmd2: link not working (to be fixed by Raaz?) |
| 032d | cmd5: added markdown to explain how to load the data |
| 032d | cmd6-10: added cells to load the data |
| 032d | cmd11: added markdown with info about creating tables |
| 032d | cmd12-13: created mobile_sample table |
| 032d | cmd14: added markdown about missing data: from mobile_sample (DeviceMake, ClientId and Country columns) and no data about country codes |
| 032d | cmd15-24: not working because of missing data |
Intro to GIS
Raazesh Sainudiin, Marina Toger
Birth of GIS: 1854 Cholera outbreak in London
Dr. John Snow father of modern epidemiology, GIS and spatial analysis, hypothesised that cholera was transmitted through the drinking of polluted water, rather than through the air, as was commonly believed, by mapping the cases.

displayHTML(frameIt("https://ds8.gitbooks.io/textbook/content/chapters/02/1/observation-and-visualization-john-snow-and-the-broad-street-pump.html",555))
GIS components
From: Longley, P. A., Goodchild, M. F., Maguire, D. J., & Rhind, D. W. (2005). Geographical information systems and science.
Applications of GIS
GIS software
Source: Kauri_Kiiman,2013
displayHTML(frameIt("https://en.wikipedia.org/wiki/List_of_geographic_information_systems_software",555))
The majority of governmental agencies and information providers, are still on proprietary mostly desktop GIS using legacy data formats like shapefiles. Researchers in various fields, e.g. ecology, geography, regional science, often use Python, R and PostGIS SQL, combined with desktop GIS software for visualisation.
Some of the biggest players: * proprietary Desktop ArcGIS of ESRI (Windows OS) * free and open-source QGIS (large community of developers, cross-platform) * free R - a software environment for statistical computing and graphics, some people using R as a GIS
These are just a few out of many software pachages, platforms and tools.
Why Magellan? - scalable.
From Ram's slide 12 of Magellan FOSS4G Talk, Boston 2017 at slideshare
In the future we might add GeoMesa
Do we need one more geospatial analytics library?
From Ram's slide 4 of this Spark Summit East 2016 talk at slideshare:
- Spatial Analytics at scale is challenging
- Simplicity + Scalability = Hard
- Ancient Data Formats
- metadata, indexing not handled well, inefficient storage
- Geospatial Analytics is not simply Business Intelligence anymore
- Statistical + Machine Learning being leveraged in geospatial
- Now is the time to do it!
- Explosion of mobile data
- Finer granularity of data collection for geometries
- Analytics stretching the limits of traditional approaches
- Spark SQL + Catalyst + Tungsten makes extensible SQL engines easier than ever before!
Crash course in GIS using QGIS software locally on your laptops.
I learned from Ujaval Gandhi's excellent albeit outdated QGIS tutorials as well as a lot of playing around. Here I use some of his materials but if you are really interested in geospatial data, I suggest you follow his original QGIS tutorials for a fast dive into the GIS world. You can also learn using A Gentle Introduction to GIS and the Training Materials from the QGIS docs and more.
About
From qgis.org: QGIS is a user friendly Open Source Geographic Information System (GIS) licensed under the GNU Public License (GPL) Version 2 or above. QGIS is an official project of the Open Source Geospatial Foundation (OSGeo). It runs on Linux, Unix, Mac OSX, Windows and Android and supports numerous vector, raster, and database formats and functionalities.
Here we shall use the QGIS 3.0.2-Girona , fresh out of the oven current version (to date 2018-04-29) released 2018-04-20 based on Python 3.6.
1. Setting up
Installation pains
Basically go to QGIS download page, and follow instruction for your OS. We add here step-by-step tutorials for OS versions that we tried which are correct for today as we tried them. QGIS has a vibrant community of contributors so this will get outdated fast.
displayHTML(frameIt("https://qgis.org/en/site/forusers/download.html", 444))
MAC
The following worked for me on OSX Yosemite 10.10.5 on a MBP from early 2011, using QGIS macOS Installer Version 3.0
0.Check if you have Python 3.6 + and install if not (this isn't in the bundle, only python.org Python 3 is supported)
I already had it:
$ python3
Python 3.6.3 (v3.6.3:2c5fed86e0, Oct 3 2017, 00:32:08)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
If this step fails for some reason you can install QGIS 2.18 LTR which is based on Python 2.7 for today.
1.Install GDAL
2.Install QGIS
Windows
The following worked for me on Windows 10 Enterprise v.1709, using OSGeo4W Network Installer .
Let's get our hands dirty
Hello world in QGIS
Copy the following to your favourite text editor and save as a csv file:
id, lat, long, name
1, 59.839264, 17.647075, point1
Open QGIS3 and start a new project
Open the Open Data manager (click on
or ⌘L)
Select Delimited Text > CSV, X and Y field, CRS > Add> Close
You have created a temporary layer containing a point.
GIS Data are stored in Vector or Raster Layers
You have created a temporary layer containing a point. To save it, right-click the layer > SaveAs
Let's have a look at the options we have:
Now you store the point in shapefile format as a file.
Most common basic vector data structures - ESRI Shapefiles
- Points
- Polygons
- Polylines
Spatial data (invisible to the user in shapefile format) + attribute tables
displayHTML(frameIt("https://en.wikipedia.org/wiki/Shapefile", 444))
ls the folder containing the shapefile you just saved. There are 6 files created with the same name:
$ ls -S -lh | awk '{print $5, $9}'
257B point4326.qpj
147B point4326.dbf
143B point4326.prj
128B point4326.shp
108B point4326.shx
5B point4326.cpg
The original csv was 24B Point1.csv
- shape files developed by Environmental Systems Research Institute (ESRI). See ESRI's what is a geospatial shape file?
- QGIS and Magellan build on http://esri.github.io/ a leading opensource geospatial library
to prep you for working with Magellan, we explore the basic Geometries and Predicates in QGIS
Geometries:
- Point
- LineString
- Polygon
- MultiPoint
- MultiPolygon (treated as a collection of Polygons and read in as a row per polygon by the GeoJSON reader)
Predicates:
- Intersects
- Contains
- Within
For more info look at the magellan README in github: https://github.com/harsha2010/magellan
Let's look at Magellan supported formats for geometry
The library currently supports reading the following formats:
ESRI Shapefiles - 788 bytes
Open source ancient format, but widely used, most data sources are in shapefiles
Our point information is contained in 6 files (!), not all are required (created by default using QGIS)
.prj is the projection file. Open it with a text editor and have a look:
GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]
Our default projection is WGS_1984. More on this later...
let's save our point in other formats
WKT of our point - 76 bytes
WKT;y;x
"POINT (17.647075 59.839264)";59.839264000000000;17.647075000000001
GeoJSON of our point - 305 bytes
{
"type": "FeatureCollection",
"name": "GJSpoint4326",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features": [
{ "type": "Feature", "properties": { "y": 59.839264, "x": 17.647075 }, "geometry": { "type": "Point", "coordinates": [ 17.647075, 59.839264 ] } }
]
}
The numbers are coordinates: long , lat (Easting, Northing) in decimal degrees, like in Google Maps.
Datum, reference surface and projection
The objective is to project earth surface to Cartesian coordinates
Source: Nathan P. Belz 2012

Bottom line, whatever projection is selected, there is always a distortion.
HOMEWORK and recommended reading on projections: * https://kartoweb.itc.nl/geometrics/Introduction/introduction.html * https://kartoweb.itc.nl/geometrics/Reference%20surfaces/body.htm * See here for more on projections.
Geographic vs Projected Coordinate Systems
Image source: Jochen Albrecht
Geographic Coordinate Systems (GCS) - Location measured from curved surface of the earth - Measurement units latitude and longitude - Degrees-minutes-seconds (DMS) - Decimal degrees (DD) or radians (rad)
Projected Coordinate Systems (PCS) - Flat surface - Units can be in meters, feet, inches - Distortions will occur, except for very fine scale maps
Spatial Reference System Identifier (SRID)
SRID are numeric codes for the spatial reference systems
List of SRID with their attributes: https://spatialreference.org/ref/epsg/3006/
Coordinate transformations
Source: geoXchange
Back to our point, let's compare WKT in two different CRS: * WGS84 (4326)
and * SWEREF99 (3006)
to do that, save the point in each of the formats we looked at (shapefile, csv-wkt, and geojson), but this time with a different CRS
note that the transformed point shapefiles are larger than the originals
$ pwd
.../3006
$ ls -S -lh | awk '{print $5, $9}'
570B point3006.qpj
379B point3006.prj
100B point3006.shp
100B point3006.shx
98B point3006.dbf
5B point3006.cpg
$ pwd
.../4326
$ ls -S -lh | awk '{print $5, $9}'
257B point4326.qpj
147B point4326.dbf
143B point4326.prj
128B point4326.shp
108B point4326.shx
5B point4326.cpg
| Name | WGS84 | SWEREF99 |
| EPSG | 4326 | 3006 |
| Units | Degrees | Metres |
| WKT | POINT (17.647075 59.839264) | POINT (648337.212857818 6636474.10921653) |
| WKT file size | 76 bytes | 128 bytes |
| geojson file size | 305 bytes | 353 bytes |
| shapefiles (together) | 788 bytes | 1,252 bytes |
Compare .prj files for
-
WGS84
GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]] -
SWEREF99
PROJCS["SWEREF99_TM",GEOGCS["GCS_SWEREF99",DATUM["D_SWEREF99",SPHEROID["GRS_1980",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",15],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["Meter",1]]
Look up the CRS you want, e.g. SWEREF99
Datum - reference points and the reference surface used to relate the coordinate system to the Earth, e.g. North American Data 1983 (NAD), or World Geodetic System 1984 (WGS84)
Data stored in GIS are always distorted, contain errors, and are only a represntation of the world in estimated postional realm (see Jere Folgert's video for more)
False northing is a linear value applied to the origin of the y coordinates. False easting and northing values are usually applied to ensure that all x and y values are positive. You can also use the false easting and northing parameters to reduce the range of the x or y coordinate values (more here and here).
Adding multiple points, create .csv :
long,lat
17.6480052,59.8393701
17.6480341,59.8392894
17.6481147,59.8392956
17.6481432,59.8392159
17.6472424,59.8391136
17.6472631,59.8390557
17.6473433,59.8390544
17.6473458,59.8390704
17.6475238,59.8390772
17.647536,59.8389047
17.6474709,59.8389008
17.6474767,59.8388648
17.6473751,59.8388618
17.6473811,59.8387833
17.6484274,59.8388934
17.6484565,59.8388052
17.648535,59.8388155
17.6485727,59.838736
17.647413,59.8386109
17.6474341,59.8385565
17.6475478,59.8385595
17.6475474,59.8385703
17.6478522,59.8385804
17.6478664,59.8383792
17.6475097,59.8383669
17.6475481,59.8382687
17.6486244,59.8383864
17.6486543,59.8383085
17.6487378,59.8383203
17.6487746,59.8382488
17.6476091,59.8381245
17.6476395,59.8380363
17.6480262,59.8380806
17.6481183,59.8378764
17.647737,59.8378336
17.6477676,59.8377667
17.6486223,59.837865
17.6486551,59.8377838
17.6487355,59.8377901
17.6487695,59.8377133
17.6478238,59.8376243
17.6478937,59.83747
17.6479301,59.8374764
17.6479606,59.83739
17.6477726,59.8373749
17.6477583,59.8374074
17.6476953,59.8374007
17.6476289,59.8375505
17.647422,59.8380329
17.6473997,59.8380848
17.6473907,59.8381057
17.6462149,59.8379803
17.6461825,59.8380509
17.6462697,59.8380615
17.6462406,59.8381342
17.6473376,59.8382476
17.6471772,59.838613
17.6466979,59.8385603
17.6467378,59.838468
17.6465739,59.8383313
17.6461422,59.8382847
17.6460358,59.8385507
17.6461231,59.8385563
17.6460883,59.8386495
17.6471381,59.838757
17.6471013,59.8388454
17.6460051,59.838735
17.6459241,59.83898
17.6458227,59.8389721
17.645788,59.8390506
17.6458626,59.8390585
17.6458292,59.8391408
17.6460294,59.8391601
17.6468978,59.8392528
17.6469334,59.8392565
17.6473675,59.8393028
17.6479529,59.839365
17.6480052,59.8393701
Open the csv (lat is y, long is x). Save as a shapefile in 3006.
You can create additional columns. We shall do this using GUI, but if you set up Postgres, you can use SQL queries.

Right-click the layer > attribute table > open field calculator > create fields for x and y
This is how the WKT looks:
WKT,long,lat,x,y
"POINT (648388.848931084 6636488.00213679)",17.648005200000000,59.839370099999996,648389,6636488
"POINT (648390.827050854 6636479.0844731)",17.648034100000000,59.839289399999998,648391,6636479
"POINT (648395.314532686 6636479.95512511)",17.648114700000001,59.839295600000000,648395,6636480
"POINT (648397.265812052 6636471.14787468)",17.648143200000000,59.839215899999999,648397,6636471
"POINT (648347.259564427 6636457.74363919)",17.647242400000000,59.839113599999997,648347,6636458
"POINT (648348.67678627 6636451.34537085)",17.647263100000000,59.839055700000003,648349,6636451
The x, y coordinates are in meters.
Polygons
let's create a polygon in QGIS, make myPolygon.csv:
WKT,gid
"MULTIPOLYGON (((17.6453 59.8395,17.649 59.8395,17.649 59.8373,17.6453 59.8373,17.6453 59.8395)))",111
This is geojson of the same polygon:
{
"type": "FeatureCollection",
"name": "myPolygon3006",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:EPSG::3006" } },
"features": [
{ "type": "Feature", "properties": { "gid": 111 }, "geometry": { "type": "MultiPolygon", "coordinates": [ [ [ [ 648236.730797635158524, 6636496.404087456874549 ], [ 648443.997444280423224, 6636504.689628538675606 ], [ 648453.793062769225799, 6636259.816340153105557 ], [ 648246.51272902963683, 6636251.530436812900007 ], [ 648236.730797635158524, 6636496.404087456874549 ] ] ] ] } }
]
}
Note how the first and last point coordinates are the same.
To add a Polyline create myPolyline.csv:
WKT,full_id
"MULTILINESTRING ((17.6453 59.8395,17.649 59.8395,17.649 59.8373,17.6453 59.8373))",6
Geojson of the polyline reprojected to SWEREF99:
{
"type": "FeatureCollection",
"name": "theLine6662_3006",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:EPSG::3006" } },
"features": [
{ "type": "Feature", "properties": { "full_id": 6 }, "geometry": { "type": "MultiLineString", "coordinates": [ [ [ 648236.730797635158524, 6636496.404087456874549 ], [ 648443.997444280423224, 6636504.689628538675606 ], [ 648453.793062769225799, 6636259.816340153105557 ], [ 648246.51272902963683, 6636251.530436812900007 ] ] ] } }
]
}
Creating geometry
Drawing geometry
The order: * add new layer (polygon, 3006) * turn on editing for the new layer * add new polygon * click to draw * right-click to finalise and fill in the attributes
Buffer
Create new geometry offset by distance of 13 m
Bounding Box
Create new geometry from BB of the buffer layer
For points geometry this is done using "minimum bounding geometry".
There are plenty of such functions, here mentioned are commonly used for spatial analysis. Another useful one:
Voronoi polygons
To try yourself: create a shapefile in 3006 of Voronoi polygons (select some buffer distance) and add columns with area, perimeter, and autoincremented gid

Check out basic statistics and histogram for area field
Simple queries
Select by attribute
The Mean value of the area was ≈ 749.8. Select only the polygons with area larger than the mean:
Save as a separate shapefile (same as usual but check "selected only")
Predicates
Understanding spatial joins
From Boundlessgeo put very nicely: 11. Spatial Relationships
From ESRI: IRelationalOperator Interface where logic is based on the geom elements:
In Magellan
Intersects
Intersects returns t (TRUE) if the intersection does not result in an empty set. Intersects returns the exact opposite result of disjoint.
Within
Within returns t (TRUE) if the first geometry is completely within the second geometry. Within tests for the exact opposite result of contains.
Contains
Contains returns t (TRUE) if the second geometry is completely contained by the first geometry. The contains predicate returns the exact opposite result of the within predicate.
Not in Magellan, from ESRI
Equal
Disjoint
Touch
Overlap
Cross
Source: ESRI Understanding spatial relations
Within
Equivalent PostgreSQL query:
SELECT *
FROM points3006try AS a
INNER JOIN myPolygon2 AS b
ON st_within(a.geom, b.geom)
Contains
Equivalent PostgreSQL query:
SELECT *
FROM largerVoronoi AS a
INNER JOIN extractedRedPoints AS b
ON st_contains(b.geom, a.geom)
Intersects
Equivalent PostgreSQL query:
SELECT *
FROM largerVoronoi AS a
INNER JOIN myPolygon2 AS b
ON st_intersects(a.geom, b.geom)
Even though this says INNER JOIN basically a Cartesian join is performed first and then the undesired results are filtered out.
displayHTML(frameIt("https://magellan.ghost.io/how-does-magellan-scale-geospatial-queries", 550))
Besides the predicates
GIS traditionally include two types of spatial joins, e.g.: * IRelationalOperator Interface: where the result is boolean (e.g. yes intersects, or doesn't) and thus joined data (e.g. attributes of the polygon within which the points are) * ITopologicalOperator Interface: where the result is geometry (e.g. the intersection)
Joins of the second type include intersection, difference, union, etc. geometry ( more here ).
Open Streets Maps (OSM)
For now we download as a shapefile and play with it in QGIS
Go to http://extract.bbbike.org/ zoom and select the desired area and extract
You should get an email from bbbike with the data:
Download and unzip in your local folder
displayHTML(frameIt("https://wiki.openstreetmap.org/wiki/Map_Features", 550))
Open the building layer in QGIS
Change the projection to 3006. Let us explore the OSM data
OSM contains more data than what we got from bbbike. We can display OSM raster tiles (if the connection works, you should be able to do this locally)

To work with OSM properly in QGIS you need plugins. I can try to show you pending success in installing plugins. Try on your own:

Some advanced operations: distance matrix, nearest neighbour, points-in-polygon, mean coordinates, ...
Count points in polygon 
To download .osm format in Magellan do this later
1.We define an area of interest and find coordinates of its boundary, AKA "bounding box". To do this go to https://www.openstreetmap.org and zoom roughly into the desired area.
2.To ingest data from OSM we use wget, in the following format:
wget -O MyFileName.osm "https://api.openstreetmap.org/api/0.6/map?bbox=l,b,r,t"
-
MyFileName.osm- give some informative file name -
l = longitude of the LEFT boundary of the bounding box
-
b = lattitude of the BOTTOM boundary of the bounding box
-
r = longitude of the RIGHT boundary of the bounding box
-
t = lattitude of the TOP boundary of the bounding box
For instance if you know the bounding box, do:
-
TinyUppsalaCentrumWgot.osm- Tiny area in Uppsala Centrum -
l =
17.63514 -
b =
59.85739 -
r =
17.64154 -
t =
59.86011
wget -O TinyUppsalaCentrumWgot.osm "https://api.openstreetmap.org/api/0.6/map?bbox=17.63514,59.85739,17.64154,59.86011"
Check out the NYC Taxi Dataset in Magellan
This is a much larger dataset and we may need access to a larger cluster - unless we just analyse a smaller subset of the data (perhaps just a month of Taxi rides in NYC). We can understand the same concepts using a much smaller dataset of Uber rides in San Francisco. We will analyse this next.
The taxi data can be downloaded from here
Let's have a look at NY neigbourhoods dataset (right-click save or wget/curl): https://github.com/harsha2010/magellan/raw/master/examples/datasets/NYC-NEIGHBORHOODS/neighborhoods.geojson
Open it in QGIS and open attribute table
Note for Spark 2.4.5
The current (2022-02-01) latest maven coordinates for Magellan do not work for Spark 2.4+ and unsupported yet for spark 3.0+:
Use the binary jar from NEW JAR TO BE UPLOAD!! https://github.com/lamastex/scalable-data-science/tree/master/custom-builds/jars/magellan/forks on Databricks Runtime 6.6, Apache Spark 2.4.5, Scala 2.11 cluster.
Instructions
- Download NEW JAR TO BE UPLOAD!! https://github.com/lamastex/scalable-data-science/raw/master/custom-builds/jars/magellan/forks/magellan_2.11-1.0.7-SNAPSHOT.jar to your loacl machine.
- In Databricks choose Create -> Library and upload the packaged jar.
- Create a spark 2.4.5 Scala 2.11 cluster with the uploaded Magellan library installed or if you are already running a cluster and installed the uploaded library to it you have to detach and re-attach any notebook currently using that cluster.
NOTE: The magellan library's usual maven coordinates harsha2010:magellan:1.0.6-s_2.11 may be outdated, but it is here for your future reference. You can follow instructions here to assemble the master jar if needed: * https://github.com/lamastex/scalable-data-science/raw/master/custom-builds/jars/magellan/master
Some Concrete Examples of Scalable Geospatial Analytics
Let us check out cross-domain data fusion in MSR's Urban Computing Group
- lots of interesting papers to read at http://research.microsoft.com/en-us/projects/urbancomputing/.
Several sciences are naturally geospatial
- forestry,
- geography,
- geology,
- seismology,
- ecology,
- etc. etc.
See for example the global EQ datastreams from US geological Service below.
For a global data source, see US geological Service's Earthquake hazards Program "http://earthquake.usgs.gov/data/.
REDO
https://magellan.ghost.io/how-does-magellan-scale-geospatial-queries/
Introduction to Magellan for Scalable Geospatial Analytics
This is a minor augmentation of Ram Harsha's Magellan code blogged here: * magellan geospatial analytics in spark
def frameIt( u:String, h:Int ) : String = {
"""<iframe
src=""""+ u+""""
width="95%" height="""" + h + """"
sandbox>
<p>
<a href="http://spark.apache.org/docs/latest/index.html">
Fallback link for browsers that, unlikely, don't support frames
</a>
</p>
</iframe>"""
}
displayHTML(frameIt("https://magellan.ghost.io/how-does-magellan-scale-geospatial-queries/", 550))
Do we need one more geospatial analytics library?
From Ram's slide 4 of this Spark Summit East 2016 talk at slideshare:
- Spatial Analytics at scale is challenging
- Simplicity + Scalability = Hard
- Ancient Data Formats
- metadata, indexing not handled well, inefficient storage
- Geospatial Analytics is not simply Business Intelligence anymore
- Statistical + Machine Learning being leveraged in geospatial
- Now is the time to do it!
- Explosion of mobile data
- Finer granularity of data collection for geometries
- Analytics stretching the limits of traditional approaches
- Spark SQL + Catalyst + Tungsten makes extensible SQL engines easier than ever before!
Nuts and Bolts of Magellan
This is an expansion oof of the following databricks notebook: * https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/137058993011870/882779309834027/6891974485343070/latest.html
and look at the magellan README in github: * https://github.com/harsha2010/magellan
HOMEWORK: Watch the magellan presentation by Ram Harsha (Hortonworks) in Spark Summit East 2016.
Other resources for magellan: * Ram's blog in HortonWorks and the ZeppelinHub view of the demo code in video above * Magellan as Spark project and Magellan github source * shape files developed by Environmental Systems Research Institute (ESRI). See ESRI's what is a geospatial shape file? * magellan builds on http://esri.github.io/ a leading opensource geospatial library
Let's get our hands dirty with basics in magellan.
Spatial Data Structures
- Points
- Polygons
- lines
- Polylines
Users' View of Spatial Data Structures (details are typically "invisible" to user)
Predicates
- within
- intersects
- contains
// create a points DataFrame
val points = sc.parallelize(Seq((-1.0, -1.0), (-1.0, 1.0), (1.0, -1.0))).toDF("x", "y")
points: org.apache.spark.sql.DataFrame = [x: double, y: double]
// transform (lat,lon) into Point using custom user-defined function
import magellan.Point // just Point
import org.apache.spark.sql.functions.udf
val toPointUDF = udf{(x:Double,y:Double) => Point(x,y) }
import magellan.Point
import org.apache.spark.sql.functions.udf
toPointUDF: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,org.apache.spark.sql.types.PointUDT@37548d6,Some(List(DoubleType, DoubleType)))
// let's show the results of the DF with a new column called point
points.withColumn("point", toPointUDF($"x", $"y")).show()
+----+----+-----------------+
| x| y| point|
+----+----+-----------------+
|-1.0|-1.0|Point(-1.0, -1.0)|
|-1.0| 1.0| Point(-1.0, 1.0)|
| 1.0|-1.0| Point(1.0, -1.0)|
+----+----+-----------------+
points.show
+----+----+
| x| y|
+----+----+
|-1.0|-1.0|
|-1.0| 1.0|
| 1.0|-1.0|
+----+----+
// Let's instead use the built-in expression to do the same - it's much faster on larger DataFrames due to code-gen
import org.apache.spark.sql.magellan.dsl.expressions._
val points = sc.parallelize(Seq((-1.0, -1.0), (-1.0, 1.0), (1.0, -1.0))).toDF("x", "y").select(point($"x", $"y").as("point"))
points.show()
+-----------------+
| point|
+-----------------+
|Point(-1.0, -1.0)|
| Point(-1.0, 1.0)|
| Point(1.0, -1.0)|
+-----------------+
import org.apache.spark.sql.magellan.dsl.expressions._
points: org.apache.spark.sql.DataFrame = [point: point]
display(points) // busted in bleeding-edge magellan we need for computing
| point |
|---|
| Point(-1.0, -1.0) |
| Point(-1.0, 1.0) |
| Point(1.0, -1.0) |
The latest version of magellan seems to have issues with the databricks display function. We will ignore this convenience of display and continue with our analysis.
This is a databricks display of magellan points when it is working properly in Spark 2.2.
Let's verify empirically if it is indeed faster for larger DataFrames.
// to generate a sequence of pairs of random numbers we can do:
import util.Random.nextDouble
Seq.fill(10)((-1.0*nextDouble,+1.0*nextDouble))
import util.Random.nextDouble
res7: Seq[(Double, Double)] = List((-0.4443119444291961,0.4405777408068594), (-0.3157738948550728,0.5017025352914497), (-0.9874637771136913,0.8519623151075828), (-0.9985464893592637,0.4278396438594432), (-0.3159292114428177,0.030487965422646535), (-0.7513798362079374,0.38194689908898793), (-0.507758712332592,0.7369770528847904), (-0.6697906990106479,0.6636420894550961), (-0.12535584996134563,0.6249808031956755), (-0.6102666766697349,0.6205652158691838))
// using the UDF method with 1 million points we can do a count action of the DF with point column
// don't add too many zeros as it may crash your driver program
sc.parallelize(Seq.fill(100000)((-1.0*nextDouble,+1.0*nextDouble)))
.toDF("x", "y")
.withColumn("point", toPointUDF('x, 'y))
.count()
res8: Long = 100000
// it should be twice as fast with code-gen especially when we are ingesting from dbfs as opposed to
// using Seq.fill in the driver...
sc.parallelize(Seq.fill(100000)((-1.0*nextDouble,+1.0*nextDouble)))
.toDF("x", "y")
.withColumn("point", point('x, 'y))
.count()
res9: Long = 100000
Creating 100.000 points by using the udf method takes 3.12 seconds, while using the magellan's build in point method takes 1.35 seconds.
See https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html
Read the following for more on catalyst optimizer and whole-stage code generation.
- https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-whole-stage-codegen.html
- https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html
- https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html
Try bench-marks here: * https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6122906529858466/293651311471490/5382278320999420/latest.html
// Create a Polygon DataFrame
import magellan.Polygon
case class PolygonExample(polygon: Polygon)
// do this in your head / pencil-paper / black-board going counter-clockwise
val ring = Array(Point(1.0, 1.0), Point(1.0, -1.0), Point(-1.0, -1.0), Point(-1.0, 1.0), Point(1.0, 1.0))
val polygon = Polygon(Array(0), ring)
val polygons = sc.parallelize(Seq(
PolygonExample(Polygon(Array(0), ring))
)).toDF()
import magellan.Polygon
defined class PolygonExample
ring: Array[magellan.Point] = Array(Point(1.0, 1.0), Point(1.0, -1.0), Point(-1.0, -1.0), Point(-1.0, 1.0), Point(1.0, 1.0))
polygon: magellan.Polygon = magellan.Polygon@427f1ce6
polygons: org.apache.spark.sql.DataFrame = [polygon: polygon]
polygons.show(false)
+-------------------------+
|polygon |
+-------------------------+
|magellan.Polygon@fc63bb26|
+-------------------------+
display(polygons) // not much can be seen as its in the object
| polygon |
|---|
| magellan.Polygon@63336515 |
This is a databricks display of magellan polygon when it is working properly in Spark 2.2 on another databricks run-time.
import org.apache.spark.sql.types._
import org.apache.spark.sql.types._
// join points with polygons upon intersection
points.join(polygons)
.where($"point" intersects $"polygon")
.count()
res13: Long = 3
points.show()
+-----------------+
| point|
+-----------------+
|Point(-1.0, -1.0)|
| Point(-1.0, 1.0)|
| Point(1.0, -1.0)|
+-----------------+
Pop Quiz:
What are the three points intersect the polygon?
More generally we can have more complex queries as the generic polygon need not even be a convex set.
This is not an uncommon polygon - think of shapes of parks or lakes on a map.
A bounding box for a non-covex polygon
Let us consider our simple points and polygons we just made and consider the following points within polygon join query.
// join points with polygons upon within or containment
points.join(polygons)
.where($"point" within $"polygon")
.count()
res17: Long = 0
//creating line from two points
import magellan.Line
case class LineExample(line: Line)
val line = Line(Point(1.0, 1.0), Point(1.0, -1.0))
val lines = sc.parallelize(Seq(
LineExample(line)
)).toDF()
lines.show(false)
+---------------------------------------+
|line |
+---------------------------------------+
|Line(Point(1.0, 1.0), Point(1.0, -1.0))|
+---------------------------------------+
import magellan.Line
defined class LineExample
line: magellan.Line = Line(Point(1.0, 1.0), Point(1.0, -1.0))
lines: org.apache.spark.sql.DataFrame = [line: line]
display(lines)
| line |
|---|
| Line(Point(1.0, 1.0), Point(1.0, -1.0)) |
This is a databricks display of magellan lines when it is working properly!
// creating polyline
import magellan.PolyLine
case class PolyLineExample(polyline: PolyLine)
val ring = Array(Point(1.0, 1.0), Point(1.0, -1.0),
Point(-1.0, -1.0), Point(-1.0, 1.0))
val polylines = sc.parallelize(Seq(
PolyLineExample(PolyLine(Array(0), ring))
)).toDF()
import magellan.PolyLine
defined class PolyLineExample
ring: Array[magellan.Point] = Array(Point(1.0, 1.0), Point(1.0, -1.0), Point(-1.0, -1.0), Point(-1.0, 1.0))
polylines: org.apache.spark.sql.DataFrame = [polyline: polyline]
polylines.show(false)
+--------------------------+
|polyline |
+--------------------------+
|magellan.PolyLine@6cc77052|
+--------------------------+
This is a databricks display of magellan polyline when it is working properly!
// now let's make a polyline with two or more lines out of the same ring
val polylines2 = sc.parallelize(Seq(
PolyLineExample(PolyLine(Array(0,2), ring)) // first line starts at index 0 and second one starts at index 2
)).toDF()
polylines2.show(false)
+--------------------------+
|polyline |
+--------------------------+
|magellan.PolyLine@43efee0d|
+--------------------------+
polylines2: org.apache.spark.sql.DataFrame = [polyline: polyline]
import magellan.Point
val p = Point(1.0, -1.0)
import magellan.Point
p: magellan.Point = Point(1.0, -1.0)
//p. // uncomment line and put the cursor next to the . and hit TAB to see available methods on the magellan Point p
(p.getX, p.getY) // for example we can getX and getY values of the Point p
res26: (Double, Double) = (1.0,-1.0)
val pc = Point(0.0,0.0)
p.withinCircle(pc, 5.0) // check if Point p iswith circle of radius 5.0 around Point pc
pc: magellan.Point = Point(0.0, 0.0)
res27: Boolean = true
p.boundingBox // find the bounding box of p
res28: magellan.BoundingBox = BoundingBox(1.0,-1.0,1.0,-1.0)
import magellan.Point
// create a radius 0.5 buffered polygon about the centre given by Point(0.0, 1.0)
val aBufferedPolygon = Point(0.0, 1.0).buffer(0.5)
magellan.esri.ESRIUtil.toESRIGeometry(aBufferedPolygon)
println(aBufferedPolygon)
magellan.Polygon@9f249027
import magellan.Point
aBufferedPolygon: magellan.Polygon = magellan.Polygon@9f249027
Dive here for more on magellan Point:
- https://github.com/harsha2010/magellan/blob/master/src/main/scala/magellan/Point.scala
Knock yourself out on other Data Structures in the source.
Uber Trajectories in San Francisco
Dataset for the Demo done by Ram Sri Harsha in Europe Spark Summit 2015
First the datasets have to be loaded into distributed file store.
- See Step 0: Downloading datasets and loading into dbfs below for doing this anew (This only needs to be done once if the data is persisted in the distributed file system).
After downloading the data, we expect to have the following files in distributed file system (dbfs):
all.tsvis the file of all uber trajectoriesSFNbhdis the directory containing SF neighborhood shape files.
// display the contents of the dbfs directory "dbfs:/datasets/magellan/"
// - if you don't see files here then go to Step 0 below as explained above!
display(dbutils.fs.ls("dbfs:/datasets/magellan/"))
| path | name | size |
|---|---|---|
| dbfs:/datasets/magellan/SFNbhd/ | SFNbhd/ | 0.0 |
| dbfs:/datasets/magellan/all.tsv | all.tsv | 6.0947802e7 |
ls /dbfs/datasets
alexandria
beijing
magellan
maps
mobile_sample
osm
sou
t-drive-trips
t-drive-trips-magellan
taxis
First five lines or rows of the uber data containing: tripID, timestamp, Lon, Lat
sc.textFile("dbfs:/datasets/magellan/all.tsv").take(5).foreach(println)
00001 2007-01-07T10:54:50+00:00 37.782551 -122.445368
00001 2007-01-07T10:54:54+00:00 37.782745 -122.444586
00001 2007-01-07T10:54:58+00:00 37.782842 -122.443688
00001 2007-01-07T10:55:02+00:00 37.782919 -122.442815
00001 2007-01-07T10:55:06+00:00 37.782992 -122.442112
The neighborhood shape files for Sanfrancisco will form the polygons of interest to us.
The shapefile format can spatially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes. Each item usually has attributes that describe it, such as name or temperature.
display(dbutils.fs.ls("dbfs:/datasets/magellan/SFNbhd")) // legacy shape files - used in various sectors
| path | name | size |
|---|---|---|
| dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.dbf | planning_neighborhoods.dbf | 1028.0 |
| dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.prj | planning_neighborhoods.prj | 567.0 |
| dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.sbn | planning_neighborhoods.sbn | 516.0 |
| dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.sbx | planning_neighborhoods.sbx | 164.0 |
| dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.shp | planning_neighborhoods.shp | 214576.0 |
| dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.shp.xml | planning_neighborhoods.shp.xml | 21958.0 |
| dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.shx | planning_neighborhoods.shx | 396.0 |
Homework
First watch the more technical magellan presentation by Ram Sri Harsha (Hortonworks) in Spark Summit Europe 2015
![Ram Sri Harsha's Magellan Spark Summit EU 2015 Talk]](https://www.youtube.com/watch?v=rP8H-xQTuM0)
Let's repeat Ram's original analysis from the following blog as done below.
This is just to get you started... You may need to moidfy this!
case class UberRecord(tripId: String, timestamp: String, point: Point) // a case class for UberRecord
defined class UberRecord
val uber = sc.textFile("dbfs:/datasets/magellan/all.tsv")
.map { line =>
val parts = line.split("\t" )
val tripId = parts(0)
val timestamp = parts(1)
val point = Point(parts(3).toDouble, parts(2).toDouble)
UberRecord(tripId, timestamp, point)
}
//.repartition(100) // using default repartition
.toDF()
.cache()
uber: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [tripId: string, timestamp: string ... 1 more field]
val uberRecordCount = uber.count() // how many Uber records?
uberRecordCount: Long = 1128663
So there are over a million UberRecords.
sqlContext.read.format("magellan").load("dbfs:/datasets/magellan/SFNbhd/").printSchema()
root
|-- point: point (nullable = true)
|-- polyline: polyline (nullable = true)
|-- polygon: polygon (nullable = true)
|-- metadata: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
|-- valid: boolean (nullable = true)
val neighborhoods = sqlContext.read.format("magellan")
.load("dbfs:/datasets/magellan/SFNbhd/")
.select($"polygon", $"metadata")
.cache()
neighborhoods: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [polygon: polygon, metadata: map<string,string>]
neighborhoods.count() // how many neighbourhoods in SF?
res36: Long = 37
neighborhoods.printSchema
root
|-- polygon: polygon (nullable = true)
|-- metadata: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
neighborhoods.show(2,false) // see the first two neighbourhoods
+-------------------------+-----------------------------------------+
|polygon |metadata |
+-------------------------+-----------------------------------------+
|magellan.Polygon@5e8b7382|[neighborho -> Twin Peaks ]|
|magellan.Polygon@aefbe87e|[neighborho -> Pacific Heights ]|
+-------------------------+-----------------------------------------+
only showing top 2 rows
You Try:
Modify the next cell to see all 37 neighborhoods.
neighborhoods.show(37,false) // modify this cell to see all 37 neighborhoods
+-------------------------+-----------------------------------------+
|polygon |metadata |
+-------------------------+-----------------------------------------+
|magellan.Polygon@9a519148|[neighborho -> Twin Peaks ]|
|magellan.Polygon@2d5e862b|[neighborho -> Pacific Heights ]|
|magellan.Polygon@eafc4a01|[neighborho -> Visitacion Valley ]|
|magellan.Polygon@b87b053f|[neighborho -> Potrero Hill ]|
|magellan.Polygon@a90162d5|[neighborho -> Crocker Amazon ]|
|magellan.Polygon@bb49ff9c|[neighborho -> Outer Mission ]|
|magellan.Polygon@fb06b113|[neighborho -> Bayview ]|
|magellan.Polygon@bafd0911|[neighborho -> Lakeshore ]|
|magellan.Polygon@ad89232d|[neighborho -> Russian Hill ]|
|magellan.Polygon@b3c46f20|[neighborho -> Golden Gate Park ]|
|magellan.Polygon@5ff06533|[neighborho -> Outer Sunset ]|
|magellan.Polygon@fa2cc9b5|[neighborho -> Inner Sunset ]|
|magellan.Polygon@6beaa40b|[neighborho -> Excelsior ]|
|magellan.Polygon@2befcdb6|[neighborho -> Outer Richmond ]|
|magellan.Polygon@7f2f3423|[neighborho -> Parkside ]|
|magellan.Polygon@16cde909|[neighborho -> Bernal Heights ]|
|magellan.Polygon@dd6fd499|[neighborho -> Noe Valley ]|
|magellan.Polygon@965ebd1c|[neighborho -> Presidio ]|
|magellan.Polygon@6e73c0c2|[neighborho -> Nob Hill ]|
|magellan.Polygon@686b88b |[neighborho -> Financial District ]|
|magellan.Polygon@a0d10f1b|[neighborho -> Glen Park ]|
|magellan.Polygon@335cadb5|[neighborho -> Marina ]|
|magellan.Polygon@4eac537f|[neighborho -> Seacliff ]|
|magellan.Polygon@5b75bfd9|[neighborho -> Mission ]|
|magellan.Polygon@5e99ea57|[neighborho -> Downtown/Civic Center ]|
|magellan.Polygon@d22d0489|[neighborho -> South of Market ]|
|magellan.Polygon@6aaf808d|[neighborho -> Presidio Heights ]|
|magellan.Polygon@5f470ef3|[neighborho -> Inner Richmond ]|
|magellan.Polygon@7dba9eb4|[neighborho -> Castro/Upper Market ]|
|magellan.Polygon@b9501895|[neighborho -> West of Twin Peaks ]|
|magellan.Polygon@b213687c|[neighborho -> Ocean View ]|
|magellan.Polygon@766d6fd4|[neighborho -> Treasure Island/YBI ]|
|magellan.Polygon@48c45968|[neighborho -> Chinatown ]|
|magellan.Polygon@d2c56329|[neighborho -> Western Addition ]|
|magellan.Polygon@c92a684c|[neighborho -> North Beach ]|
|magellan.Polygon@ce8caa28|[neighborho -> Diamond Heights ]|
|magellan.Polygon@aac6c49d|[neighborho -> Haight Ashbury ]|
+-------------------------+-----------------------------------------+
import org.apache.spark.sql.functions._ // this is needed for sql functions like explode, etc.
import org.apache.spark.sql.functions._
//names of all 37 neighborhoods of San Francisco
neighborhoods.select(explode($"metadata").as(Seq("k", "v"))).show(37,false)
+----------+-------------------------+
|k |v |
+----------+-------------------------+
|neighborho|Twin Peaks |
|neighborho|Pacific Heights |
|neighborho|Visitacion Valley |
|neighborho|Potrero Hill |
|neighborho|Crocker Amazon |
|neighborho|Outer Mission |
|neighborho|Bayview |
|neighborho|Lakeshore |
|neighborho|Russian Hill |
|neighborho|Golden Gate Park |
|neighborho|Outer Sunset |
|neighborho|Inner Sunset |
|neighborho|Excelsior |
|neighborho|Outer Richmond |
|neighborho|Parkside |
|neighborho|Bernal Heights |
|neighborho|Noe Valley |
|neighborho|Presidio |
|neighborho|Nob Hill |
|neighborho|Financial District |
|neighborho|Glen Park |
|neighborho|Marina |
|neighborho|Seacliff |
|neighborho|Mission |
|neighborho|Downtown/Civic Center |
|neighborho|South of Market |
|neighborho|Presidio Heights |
|neighborho|Inner Richmond |
|neighborho|Castro/Upper Market |
|neighborho|West of Twin Peaks |
|neighborho|Ocean View |
|neighborho|Treasure Island/YBI |
|neighborho|Chinatown |
|neighborho|Western Addition |
|neighborho|North Beach |
|neighborho|Diamond Heights |
|neighborho|Haight Ashbury |
+----------+-------------------------+
This join below yields nothing.
So what's going on?
Watch Ram's 2015 Spark Summit talk for details on geospatial formats and transformations.
neighborhoods
.join(uber)
.where($"point" within $"polygon")
.select($"tripId", $"timestamp", explode($"metadata").as(Seq("k", "v")))
.withColumnRenamed("v", "neighborhood")
.drop("k")
.show(5)
+------+---------+------------+
|tripId|timestamp|neighborhood|
+------+---------+------------+
+------+---------+------------+
Need the right transformer to transform the points into the right coordinate system of the shape files.
displayHTML(frameIt("https://en.wikipedia.org/wiki/North_American_Datum#North_American_Datum_of_1983",400))
// This code was removed from magellan in this commit:
// https://github.com/harsha2010/magellan/commit/8df0a62560116f8ed787fc7e86f190f8e2730826
// We bring this back to show how to roll our own transformations.
// EXERCISE: find existing transformers / methods in magellan or esri to go between coordinate systems
import magellan.Point
class NAD83(params: Map[String, Any]) {
val RAD = 180d / Math.PI
val ER = 6378137.toDouble // semi-major axis for GRS-80
val RF = 298.257222101 // reciprocal flattening for GRS-80
val F = 1.toDouble / RF // flattening for GRS-80
val ESQ = F + F - (F * F)
val E = StrictMath.sqrt(ESQ)
private val ZONES = Map(
401 -> Array(122.toDouble, 2000000.0001016,
500000.0001016001, 40.0,
41.66666666666667, 39.33333333333333),
403 -> Array(120.5, 2000000.0001016,
500000.0001016001, 37.06666666666667,
38.43333333333333, 36.5)
)
def from() = {
val zone = params("zone").asInstanceOf[Int]
ZONES.get(zone) match {
case Some(x) => if (x.length == 5) {
toTransverseMercator(x)
} else {
toLambertConic(x)
}
case None => ???
}
}
def to() = {
val zone = params("zone").asInstanceOf[Int]
ZONES.get(zone) match {
case Some(x) => if (x.length == 5) {
fromTransverseMercator(x)
} else {
fromLambertConic(x)
}
case None => ???
}
}
def qqq(e: Double, s: Double) = {
(StrictMath.log((1 + s) / (1 - s)) - e *
StrictMath.log((1 + e * s) / (1 - e * s))) / 2
}
def toLambertConic(params: Array[Double]) = {
val cm = params(0) / RAD // CENTRAL MERIDIAN (CM)
val eo = params(1) // FALSE EASTING VALUE AT THE CM (METERS)
val nb = params(2) // FALSE NORTHING VALUE AT SOUTHERMOST PARALLEL (METERS), (USUALLY ZERO)
val fis = params(3) / RAD // LATITUDE OF SO. STD. PARALLEL
val fin = params(4) / RAD // LATITUDE OF NO. STD. PARALLEL
val fib = params(5) / RAD // LATITUDE OF SOUTHERNMOST PARALLEL
val sinfs = StrictMath.sin(fis)
val cosfs = StrictMath.cos(fis)
val sinfn = StrictMath.sin(fin)
val cosfn = StrictMath.cos(fin)
val sinfb = StrictMath.sin(fib)
val qs = qqq(E, sinfs)
val qn = qqq(E, sinfn)
val qb = qqq(E, sinfb)
val w1 = StrictMath.sqrt(1.toDouble - ESQ * sinfs * sinfs)
val w2 = StrictMath.sqrt(1.toDouble - ESQ * sinfn * sinfn)
val sinfo = StrictMath.log(w2 * cosfs / (w1 * cosfn)) / (qn - qs)
val k = ER * cosfs * StrictMath.exp(qs * sinfo) / (w1 * sinfo)
val rb = k / StrictMath.exp(qb * sinfo)
(point: Point) => {
val (long, lat) = (point.getX(), point.getY())
val l = - long / RAD
val f = lat / RAD
val q = qqq(E, StrictMath.sin(f))
val r = k / StrictMath.exp(q * sinfo)
val gam = (cm - l) * sinfo
val n = rb + nb - (r * StrictMath.cos(gam))
val e = eo + (r * StrictMath.sin(gam))
Point(e, n)
}
}
def toTransverseMercator(params: Array[Double]) = {
(point: Point) => {
point
}
}
def fromLambertConic(params: Array[Double]) = {
val cm = params(0) / RAD // CENTRAL MERIDIAN (CM)
val eo = params(1) // FALSE EASTING VALUE AT THE CM (METERS)
val nb = params(2) // FALSE NORTHING VALUE AT SOUTHERMOST PARALLEL (METERS), (USUALLY ZERO)
val fis = params(3) / RAD // LATITUDE OF SO. STD. PARALLEL
val fin = params(4) / RAD // LATITUDE OF NO. STD. PARALLEL
val fib = params(5) / RAD // LATITUDE OF SOUTHERNMOST PARALLEL
val sinfs = StrictMath.sin(fis)
val cosfs = StrictMath.cos(fis)
val sinfn = StrictMath.sin(fin)
val cosfn = StrictMath.cos(fin)
val sinfb = StrictMath.sin(fib)
val qs = qqq(E, sinfs)
val qn = qqq(E, sinfn)
val qb = qqq(E, sinfb)
val w1 = StrictMath.sqrt(1.toDouble - ESQ * sinfs * sinfs)
val w2 = StrictMath.sqrt(1.toDouble - ESQ * sinfn * sinfn)
val sinfo = StrictMath.log(w2 * cosfs / (w1 * cosfn)) / (qn - qs)
val k = ER * cosfs * StrictMath.exp(qs * sinfo) / (w1 * sinfo)
val rb = k / StrictMath.exp(qb * sinfo)
(point: Point) => {
val easting = point.getX()
val northing = point.getY()
val npr = rb - northing + nb
val epr = easting - eo
val gam = StrictMath.atan(epr / npr)
val lon = cm - (gam / sinfo)
val rpt = StrictMath.sqrt(npr * npr + epr * epr)
val q = StrictMath.log(k / rpt) / sinfo
val temp = StrictMath.exp(q + q)
var sine = (temp - 1.toDouble) / (temp + 1.toDouble)
var f1, f2 = 0.0
for (i <- 0 until 2) {
f1 = ((StrictMath.log((1.toDouble + sine) / (1.toDouble - sine)) - E *
StrictMath.log((1.toDouble + E * sine) / (1.toDouble - E * sine))) / 2.toDouble) - q
f2 = 1.toDouble / (1.toDouble - sine * sine) - ESQ / (1.toDouble - ESQ * sine * sine)
sine -= (f1/ f2)
}
Point(StrictMath.toDegrees(lon) * -1, StrictMath.toDegrees(StrictMath.asin(sine)))
}
}
def fromTransverseMercator(params: Array[Double]) = {
val cm = params(0) // CENTRAL MERIDIAN (CM)
val fe = params(1) // FALSE EASTING VALUE AT THE CM (METERS)
val or = params(2) / RAD // origin latitude
val sf = 1.0 - (1.0 / params(3)) // scale factor
val fn = params(4) // false northing
// translated from TCONPC subroutine
val eps = ESQ / (1.0 - ESQ)
val pr = (1.0 - F) * ER
val en = (ER - pr) / (ER + pr)
val en2 = en * en
val en3 = en * en * en
val en4 = en2 * en2
var c2 = -3.0 * en / 2.0 + 9.0 * en3 / 16.0
var c4 = 15.0d * en2 / 16.0d - 15.0d * en4 /32.0
var c6 = -35.0 * en3 / 48.0
var c8 = 315.0 * en4 / 512.0
val u0 = 2.0 * (c2 - 2.0 * c4 + 3.0 * c6 - 4.0 * c8)
val u2 = 8.0 * (c4 - 4.0 * c6 + 10.0 * c8)
val u4 = 32.0 * (c6 - 6.0 * c8)
val u6 = 129.0 * c8
c2 = 3.0 * en / 2.0 - 27.0 * en3 / 32.0
c4 = 21.0 * en2 / 16.0 - 55.0 * en4 / 32.0d
c6 = 151.0 * en3 / 96.0
c8 = 1097.0d * en4 / 512.0
val v0 = 2.0 * (c2 - 2.0 * c4 + 3.0 * c6 - 4.0 * c8)
val v2 = 8.0 * (c4 - 4.0 * c6 + 10.0 * c8)
val v4 = 32.0 * (c6 - 6.0 * c8)
val v6 = 128.0 * c8
val r = ER * (1.0 - en) * (1.0 - en * en) * (1.0 + 2.25 * en * en + (225.0 / 64.0) * en4)
val cosor = StrictMath.cos(or)
val omo = or + StrictMath.sin(or) * cosor *
(u0 + u2 * cosor * cosor + u4 * StrictMath.pow(cosor, 4) + u6 * StrictMath.pow(cosor, 6))
val so = sf * r * omo
(point: Point) => {
val easting = point.getX()
val northing = point.getY()
// translated from TMGEOD subroutine
val om = (northing - fn + so) / (r * sf)
val cosom = StrictMath.cos(om)
val foot = om + StrictMath.sin(om) * cosom *
(v0 + v2 * cosom * cosom + v4 * StrictMath.pow(cosom, 4) + v6 * StrictMath.pow(cosom, 6))
val sinf = StrictMath.sin(foot)
val cosf = StrictMath.cos(foot)
val tn = sinf / cosf
val ts = tn * tn
val ets = eps * cosf * cosf
val rn = ER * sf / StrictMath.sqrt(1.0 - ESQ * sinf * sinf)
val q = (easting - fe) / rn
val qs = q * q
val b2 = -tn * (1.0 + ets) / 2.0
val b4 = -(5.0 + 3.0 * ts + ets * (1.0 - 9.0 * ts) - 4.0 * ets * ets) / 12.0
val b6 = (61.0 + 45.0 * ts * (2.0 + ts) + ets * (46.0 - 252.0 * ts -60.0 * ts * ts)) / 360.0
val b1 = 1.0
val b3 = -(1.0 + ts + ts + ets) / 6.0
val b5 = (5.0 + ts * (28.0 + 24.0 * ts) + ets * (6.0 + 8.0 * ts)) / 120.0
val b7 = -(61.0 + 662.0 * ts + 1320.0 * ts * ts + 720.0 * StrictMath.pow(ts, 3)) / 5040.0
val lat = foot + b2 * qs * (1.0 + qs * (b4 + b6 * qs))
val l = b1 * q * (1.0 + qs * (b3 + qs * (b5 + b7 * qs)))
val lon = -l / cosf + cm
Point(StrictMath.toDegrees(lon) * -1, StrictMath.toDegrees(lat))
}
}
}
import magellan.Point
defined class NAD83
val transformer: Point => Point = (point: Point) => {
val from = new NAD83(Map("zone" -> 403)).from()
val p = point.transform(from)
Point(3.28084 * p.getX, 3.28084 * p.getY)
}
// add a new column in nad83 coordinates
val uberTransformed = uber
.withColumn("nad83", $"point".transform(transformer))
.cache()
transformer: magellan.Point => magellan.Point = <function1>
uberTransformed: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [tripId: string, timestamp: string ... 2 more fields]
uberTransformed.count()
res43: Long = 1128663
uberTransformed.show(5,false) // nad83 transformed points
+------+-------------------------+-----------------------------+---------------------------------------------+
|tripId|timestamp |point |nad83 |
+------+-------------------------+-----------------------------+---------------------------------------------+
|00001 |2007-01-07T10:54:50+00:00|Point(-122.445368, 37.782551)|Point(5999523.477715266, 2113253.7290443885) |
|00001 |2007-01-07T10:54:54+00:00|Point(-122.444586, 37.782745)|Point(5999750.8888492435, 2113319.6570987953)|
|00001 |2007-01-07T10:54:58+00:00|Point(-122.443688, 37.782842)|Point(6000011.08106823, 2113349.5785887106) |
|00001 |2007-01-07T10:55:02+00:00|Point(-122.442815, 37.782919)|Point(6000263.898268142, 2113372.3716762937) |
|00001 |2007-01-07T10:55:06+00:00|Point(-122.442112, 37.782992)|Point(6000467.566895697, 2113394.7303657546) |
+------+-------------------------+-----------------------------+---------------------------------------------+
only showing top 5 rows
uberTransformed.select("tripId").distinct().count() // number of unique tripIds
res45: Long = 24999
Let' try the join again after appropriate transformation of coordinate system.
val joined = neighborhoods
.join(uberTransformed)
.where($"nad83" within $"polygon")
.select($"tripId", $"timestamp", explode($"metadata").as(Seq("k", "v")))
.withColumnRenamed("v", "neighborhood")
.drop("k")
.cache()
joined: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [tripId: string, timestamp: string ... 1 more field]
val UberRecordsInNbhdsCount = joined.count() // about 131 seconds for first action (doing broadcast hash join)
UberRecordsInNbhdsCount: Long = 1085087
joined.explain
== Physical Plan ==
InMemoryTableScan [tripId#469, timestamp#470, neighborhood#929]
+- InMemoryRelation [tripId#469, timestamp#470, neighborhood#929], StorageLevel(disk, memory, deserialized, 1 replicas)
+- *(1) Project [tripId#469, timestamp#470, v#924 AS neighborhood#929]
+- *(1) Generate explode(metadata#580), [tripId#469, timestamp#470], false, [k#923, v#924]
+- *(1) Project [metadata#580, tripId#469, timestamp#470]
+- *(1) BroadcastNestedLoopJoin BuildLeft, Inner, Within(nad83#745, polygon#579)
:- BroadcastExchange IdentityBroadcastMode, [id=#1719]
: +- InMemoryTableScan [polygon#579, metadata#580]
: +- InMemoryRelation [polygon#579, metadata#580], StorageLevel(disk, memory, deserialized, 1 replicas)
: +- *(1) Scan ShapeFileRelation(dbfs:/datasets/magellan/SFNbhd/,Map(path -> dbfs:/datasets/magellan/SFNbhd/)) [polygon#579,metadata#580] PushedFilters: [], ReadSchema: struct<polygon:struct<type:int,xmin:double,ymin:double,xmax:double,ymax:double,indices:array<int>...
+- InMemoryTableScan [tripId#469, timestamp#470, nad83#745]
+- InMemoryRelation [tripId#469, timestamp#470, point#471, nad83#745], StorageLevel(disk, memory, deserialized, 1 replicas)
+- *(1) Project [tripId#469, timestamp#470, point#471, transformer(point#471, <function1>) AS nad83#745]
+- InMemoryTableScan [point#471, timestamp#470, tripId#469]
+- InMemoryRelation [tripId#469, timestamp#470, point#471], StorageLevel(disk, memory, deserialized, 1 replicas)
+- *(1) SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(input[0, line891d49738e2e4c728aab43b5afc9663a112.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$UberRecord, true]).tripId, true, false) AS tripId#469, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(input[0, line891d49738e2e4c728aab43b5afc9663a112.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$UberRecord, true]).timestamp, true, false) AS timestamp#470, newInstance(class org.apache.spark.sql.types.PointUDT).serialize AS point#471]
+- Scan[obj#468]
joined.show(5,false)
+------+-------------------------+-------------------------+
|tripId|timestamp |neighborhood |
+------+-------------------------+-------------------------+
|00001 |2007-01-07T10:54:50+00:00|Western Addition |
|00001 |2007-01-07T10:54:54+00:00|Western Addition |
|00001 |2007-01-07T10:54:58+00:00|Western Addition |
|00001 |2007-01-07T10:55:02+00:00|Western Addition |
|00001 |2007-01-07T10:55:06+00:00|Western Addition |
+------+-------------------------+-------------------------+
only showing top 5 rows
uberRecordCount - UberRecordsInNbhdsCount // records not in the neighbouthood shape files
res49: Long = 43576
joined
.groupBy($"neighborhood")
.agg(countDistinct("tripId")
.as("trips"))
.orderBy(col("trips").desc)
.show(5,false)
+-------------------------+-----+
|neighborhood |trips|
+-------------------------+-----+
|South of Market |9891 |
|Western Addition |6794 |
|Downtown/Civic Center |6697 |
|Financial District |6038 |
|Mission |5620 |
+-------------------------+-----+
only showing top 5 rows
Other spatial Algorithms in Spark are being explored for generic and more efficient scalable geospatial analytic tasks
Read for more spatial indexing structures.
- SpatialSpark aims to provide efficient spatial operations using Apache Spark.
- Spatial Partition
- Generate a spatial partition from input dataset, currently Fixed-Grid Partition (FGP), Binary-Split Partition (BSP) and Sort-Tile Partition (STP) are supported.
- Spatial Range Query
- includes both indexed and non-indexed query (useful for neighbourhood searches)
- Spatial Partition
- z-order Knn join
- A space-filling curve trick to index multi-dimensional metric data into 1 Dimension. See: ieee paper and the slides.
- AkNN = All K Nearest Neighbours - identify the k nearesy neighbours for all nodes simultaneously (cont AkNN is the streaming form of AkNN)
- need to identify the right resources to do this scalably.
- spark-knn-graphs: https://github.com/tdebatty/spark-knn-graphs *** ***
Step 0: Downloading datasets and load into dbfs
- get the Uber data
- get the San Francisco neighborhood data
ls
conf
derby.log
eventlogs
ganglia
logs
wget https://raw.githubusercontent.com/dima42/uber-gps-analysis/master/gpsdata/all.tsv
#wget http://lamastex.org/datasets/public/geospatial/uber/all.tsv
--2022-02-01 14:21:17-- https://raw.githubusercontent.com/dima42/uber-gps-analysis/master/gpsdata/all.tsv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 60947802 (58M) [text/plain]
Saving to: ‘all.tsv’
0K .......... .......... .......... .......... .......... 0% 4.78M 12s
50K .......... .......... .......... .......... .......... 0% 4.04M 13s
100K .......... .......... .......... .......... .......... 0% 5.61M 12s
150K .......... .......... .......... .......... .......... 0% 19.6M 10s
200K .......... .......... .......... .......... .......... 0% 40.4M 8s
250K .......... .......... .......... .......... .......... 0% 7.67M 8s
300K .......... .......... .......... .......... .......... 0% 23.6M 7s
350K .......... .......... .......... .......... .......... 0% 72.8M 6s
400K .......... .......... .......... .......... .......... 0% 22.2M 6s
450K .......... .......... .......... .......... .......... 0% 115M 5s
500K .......... .......... .......... .......... .......... 0% 12.8M 5s
550K .......... .......... .......... .......... .......... 1% 41.7M 5s
600K .......... .......... .......... .......... .......... 1% 109M 5s
650K .......... .......... .......... .......... .......... 1% 72.6M 4s
700K .......... .......... .......... .......... .......... 1% 46.3M 4s
750K .......... .......... .......... .......... .......... 1% 90.8M 4s
800K .......... .......... .......... .......... .......... 1% 85.0M 4s
850K .......... .......... .......... .......... .......... 1% 98.7M 4s
900K .......... .......... .......... .......... .......... 1% 88.8M 3s
950K .......... .......... .......... .......... .......... 1% 12.3M 3s
1000K .......... .......... .......... .......... .......... 1% 66.6M 3s
1050K .......... .......... .......... .......... .......... 1% 127M 3s
1100K .......... .......... .......... .......... .......... 1% 53.0M 3s
1150K .......... .......... .......... .......... .......... 2% 85.6M 3s
1200K .......... .......... .......... .......... .......... 2% 81.8M 3s
1250K .......... .......... .......... .......... .......... 2% 91.5M 3s
1300K .......... .......... .......... .......... .......... 2% 85.0M 3s
1350K .......... .......... .......... .......... .......... 2% 78.6M 3s
1400K .......... .......... .......... .......... .......... 2% 74.0M 3s
1450K .......... .......... .......... .......... .......... 2% 81.5M 3s
1500K .......... .......... .......... .......... .......... 2% 82.1M 2s
1550K .......... .......... .......... .......... .......... 2% 82.2M 2s
1600K .......... .......... .......... .......... .......... 2% 103M 2s
1650K .......... .......... .......... .......... .......... 2% 103M 2s
1700K .......... .......... .......... .......... .......... 2% 91.0M 2s
1750K .......... .......... .......... .......... .......... 3% 82.6M 2s
1800K .......... .......... .......... .......... .......... 3% 86.4M 2s
1850K .......... .......... .......... .......... .......... 3% 90.7M 2s
1900K .......... .......... .......... .......... .......... 3% 79.5M 2s
1950K .......... .......... .......... .......... .......... 3% 70.8M 2s
2000K .......... .......... .......... .......... .......... 3% 91.1M 2s
2050K .......... .......... .......... .......... .......... 3% 75.5M 2s
2100K .......... .......... .......... .......... .......... 3% 87.2M 2s
2150K .......... .......... .......... .......... .......... 3% 64.9M 2s
2200K .......... .......... .......... .......... .......... 3% 74.8M 2s
2250K .......... .......... .......... .......... .......... 3% 71.8M 2s
2300K .......... .......... .......... .......... .......... 3% 77.2M 2s
2350K .......... .......... .......... .......... .......... 4% 59.8M 2s
2400K .......... .......... .......... .......... .......... 4% 86.2M 2s
2450K .......... .......... .......... .......... .......... 4% 92.3M 2s
2500K .......... .......... .......... .......... .......... 4% 75.4M 2s
2550K .......... .......... .......... .......... .......... 4% 87.0M 2s
2600K .......... .......... .......... .......... .......... 4% 89.6M 2s
2650K .......... .......... .......... .......... .......... 4% 126M 2s
2700K .......... .......... .......... .......... .......... 4% 107M 2s
2750K .......... .......... .......... .......... .......... 4% 108M 2s
2800K .......... .......... .......... .......... .......... 4% 150M 2s
2850K .......... .......... .......... .......... .......... 4% 6.79M 2s
2900K .......... .......... .......... .......... .......... 4% 88.7M 2s
2950K .......... .......... .......... .......... .......... 5% 62.4M 2s
3000K .......... .......... .......... .......... .......... 5% 82.6M 2s
3050K .......... .......... .......... .......... .......... 5% 70.7M 2s
3100K .......... .......... .......... .......... .......... 5% 76.5M 2s
3150K .......... .......... .......... .......... .......... 5% 64.7M 2s
3200K .......... .......... .......... .......... .......... 5% 75.1M 2s
3250K .......... .......... .......... .......... .......... 5% 85.2M 2s
3300K .......... .......... .......... .......... .......... 5% 67.2M 2s
3350K .......... .......... .......... .......... .......... 5% 57.0M 2s
3400K .......... .......... .......... .......... .......... 5% 77.5M 2s
3450K .......... .......... .......... .......... .......... 5% 72.9M 2s
3500K .......... .......... .......... .......... .......... 5% 72.1M 2s
3550K .......... .......... .......... .......... .......... 6% 65.7M 2s
3600K .......... .......... .......... .......... .......... 6% 67.8M 2s
3650K .......... .......... .......... .......... .......... 6% 77.6M 1s
3700K .......... .......... .......... .......... .......... 6% 71.1M 1s
3750K .......... .......... .......... .......... .......... 6% 57.3M 1s
3800K .......... .......... .......... .......... .......... 6% 69.2M 1s
3850K .......... .......... .......... .......... .......... 6% 74.5M 1s
3900K .......... .......... .......... .......... .......... 6% 79.1M 1s
3950K .......... .......... .......... .......... .......... 6% 73.3M 1s
4000K .......... .......... .......... .......... .......... 6% 64.1M 1s
4050K .......... .......... .......... .......... .......... 6% 85.2M 1s
4100K .......... .......... .......... .......... .......... 6% 68.1M 1s
4150K .......... .......... .......... .......... .......... 7% 63.9M 1s
4200K .......... .......... .......... .......... .......... 7% 82.1M 1s
4250K .......... .......... .......... .......... .......... 7% 57.8M 1s
4300K .......... .......... .......... .......... .......... 7% 81.6M 1s
4350K .......... .......... .......... .......... .......... 7% 4.48M 1s
4400K .......... .......... .......... .......... .......... 7% 95.6M 1s
4450K .......... .......... .......... .......... .......... 7% 77.9M 1s
4500K .......... .......... .......... .......... .......... 7% 88.5M 1s
4550K .......... .......... .......... .......... .......... 7% 73.6M 1s
4600K .......... .......... .......... .......... .......... 7% 23.0M 1s
4650K .......... .......... .......... .......... .......... 7% 126M 1s
4700K .......... .......... .......... .......... .......... 7% 145M 1s
4750K .......... .......... .......... .......... .......... 8% 64.5M 1s
4800K .......... .......... .......... .......... .......... 8% 32.3M 1s
4850K .......... .......... .......... .......... .......... 8% 11.0M 1s
4900K .......... .......... .......... .......... .......... 8% 122M 1s
4950K .......... .......... .......... .......... .......... 8% 9.11M 1s
5000K .......... .......... .......... .......... .......... 8% 94.6M 1s
5050K .......... .......... .......... .......... .......... 8% 100M 1s
5100K .......... .......... .......... .......... .......... 8% 114M 1s
5150K .......... .......... .......... .......... .......... 8% 102M 1s
5200K .......... .......... .......... .......... .......... 8% 94.7M 1s
5250K .......... .......... .......... .......... .......... 8% 107M 1s
5300K .......... .......... .......... .......... .......... 8% 67.8M 1s
5350K .......... .......... .......... .......... .......... 9% 121M 1s
5400K .......... .......... .......... .......... .......... 9% 92.2M 1s
5450K .......... .......... .......... .......... .......... 9% 134M 1s
5500K .......... .......... .......... .......... .......... 9% 140M 1s
5550K .......... .......... .......... .......... .......... 9% 93.9M 1s
5600K .......... .......... .......... .......... .......... 9% 125M 1s
5650K .......... .......... .......... .......... .......... 9% 118M 1s
5700K .......... .......... .......... .......... .......... 9% 99.6M 1s
5750K .......... .......... .......... .......... .......... 9% 121M 1s
5800K .......... .......... .......... .......... .......... 9% 107M 1s
5850K .......... .......... .......... .......... .......... 9% 142M 1s
5900K .......... .......... .......... .......... .......... 9% 156M 1s
5950K .......... .......... .......... .......... .......... 10% 104M 1s
6000K .......... .......... .......... .......... .......... 10% 145M 1s
6050K .......... .......... .......... .......... .......... 10% 153M 1s
6100K .......... .......... .......... .......... .......... 10% 104M 1s
6150K .......... .......... .......... .......... .......... 10% 94.7M 1s
6200K .......... .......... .......... .......... .......... 10% 71.3M 1s
6250K .......... .......... .......... .......... .......... 10% 104M 1s
6300K .......... .......... .......... .......... .......... 10% 82.2M 1s
6350K .......... .......... .......... .......... .......... 10% 86.7M 1s
6400K .......... .......... .......... .......... .......... 10% 105M 1s
6450K .......... .......... .......... .......... .......... 10% 100M 1s
6500K .......... .......... .......... .......... .......... 11% 112M 1s
6550K .......... .......... .......... .......... .......... 11% 66.9M 1s
6600K .......... .......... .......... .......... .......... 11% 137M 1s
6650K .......... .......... .......... .......... .......... 11% 102M 1s
6700K .......... .......... .......... .......... .......... 11% 116M 1s
6750K .......... .......... .......... .......... .......... 11% 115M 1s
6800K .......... .......... .......... .......... .......... 11% 108M 1s
6850K .......... .......... .......... .......... .......... 11% 143M 1s
6900K .......... .......... .......... .......... .......... 11% 94.0M 1s
6950K .......... .......... .......... .......... .......... 11% 100M 1s
7000K .......... .......... .......... .......... .......... 11% 153M 1s
7050K .......... .......... .......... .......... .......... 11% 77.2M 1s
7100K .......... .......... .......... .......... .......... 12% 110M 1s
7150K .......... .......... .......... .......... .......... 12% 63.3M 1s
7200K .......... .......... .......... .......... .......... 12% 86.5M 1s
7250K .......... .......... .......... .......... .......... 12% 88.8M 1s
7300K .......... .......... .......... .......... .......... 12% 109M 1s
7350K .......... .......... .......... .......... .......... 12% 85.1M 1s
7400K .......... .......... .......... .......... .......... 12% 127M 1s
7450K .......... .......... .......... .......... .......... 12% 128M 1s
7500K .......... .......... .......... .......... .......... 12% 129M 1s
7550K .......... .......... .......... .......... .......... 12% 141M 1s
7600K .......... .......... .......... .......... .......... 12% 115M 1s
7650K .......... .......... .......... .......... .......... 12% 122M 1s
7700K .......... .......... .......... .......... .......... 13% 151M 1s
7750K .......... .......... .......... .......... .......... 13% 34.2M 1s
7800K .......... .......... .......... .......... .......... 13% 28.5M 1s
7850K .......... .......... .......... .......... .......... 13% 30.1M 1s
7900K .......... .......... .......... .......... .......... 13% 31.6M 1s
7950K .......... .......... .......... .......... .......... 13% 26.2M 1s
8000K .......... .......... .......... .......... .......... 13% 29.8M 1s
8050K .......... .......... .......... .......... .......... 13% 31.2M 1s
8100K .......... .......... .......... .......... .......... 13% 30.2M 1s
8150K .......... .......... .......... .......... .......... 13% 24.6M 1s
8200K .......... .......... .......... .......... .......... 13% 30.7M 1s
8250K .......... .......... .......... .......... .......... 13% 33.0M 1s
8300K .......... .......... .......... .......... .......... 14% 40.0M 1s
8350K .......... .......... .......... .......... .......... 14% 33.0M 1s
8400K .......... .......... .......... .......... .......... 14% 31.6M 1s
8450K .......... .......... .......... .......... .......... 14% 35.9M 1s
8500K .......... .......... .......... .......... .......... 14% 37.3M 1s
8550K .......... .......... .......... .......... .......... 14% 25.2M 1s
8600K .......... .......... .......... .......... .......... 14% 31.4M 1s
8650K .......... .......... .......... .......... .......... 14% 34.6M 1s
8700K .......... .......... .......... .......... .......... 14% 35.7M 1s
8750K .......... .......... .......... .......... .......... 14% 25.8M 1s
8800K .......... .......... .......... .......... .......... 14% 28.4M 1s
8850K .......... .......... .......... .......... .......... 14% 29.3M 1s
8900K .......... .......... .......... .......... .......... 15% 28.5M 1s
8950K .......... .......... .......... .......... .......... 15% 30.5M 1s
9000K .......... .......... .......... .......... .......... 15% 26.6M 1s
9050K .......... .......... .......... .......... .......... 15% 28.8M 1s
9100K .......... .......... .......... .......... .......... 15% 28.9M 1s
9150K .......... .......... .......... .......... .......... 15% 27.5M 1s
9200K .......... .......... .......... .......... .......... 15% 29.0M 1s
9250K .......... .......... .......... .......... .......... 15% 30.5M 1s
9300K .......... .......... .......... .......... .......... 15% 29.2M 1s
9350K .......... .......... .......... .......... .......... 15% 25.7M 1s
9400K .......... .......... .......... .......... .......... 15% 25.8M 1s
9450K .......... .......... .......... .......... .......... 15% 29.3M 1s
9500K .......... .......... .......... .......... .......... 16% 3.77M 1s
9550K .......... .......... .......... .......... .......... 16% 19.9M 1s
9600K .......... .......... .......... .......... .......... 16% 22.7M 1s
9650K .......... .......... .......... .......... .......... 16% 31.6M 1s
9700K .......... .......... .......... .......... .......... 16% 26.2M 1s
9750K .......... .......... .......... .......... .......... 16% 21.1M 1s
9800K .......... .......... .......... .......... .......... 16% 18.2M 1s
9850K .......... .......... .......... .......... .......... 16% 26.4M 1s
9900K .......... .......... .......... .......... .......... 16% 38.4M 1s
9950K .......... .......... .......... .......... .......... 16% 29.7M 1s
10000K .......... .......... .......... .......... .......... 16% 35.5M 1s
10050K .......... .......... .......... .......... .......... 16% 32.0M 1s
10100K .......... .......... .......... .......... .......... 17% 31.2M 1s
10150K .......... .......... .......... .......... .......... 17% 29.4M 1s
10200K .......... .......... .......... .......... .......... 17% 29.2M 1s
10250K .......... .......... .......... .......... .......... 17% 33.4M 1s
10300K .......... .......... .......... .......... .......... 17% 35.5M 1s
10350K .......... .......... .......... .......... .......... 17% 25.0M 1s
10400K .......... .......... .......... .......... .......... 17% 22.0M 1s
10450K .......... .......... .......... .......... .......... 17% 29.7M 1s
10500K .......... .......... .......... .......... .......... 17% 28.5M 1s
10550K .......... .......... .......... .......... .......... 17% 30.8M 1s
10600K .......... .......... .......... .......... .......... 17% 99.1M 1s
10650K .......... .......... .......... .......... .......... 17% 114M 1s
10700K .......... .......... .......... .......... .......... 18% 104M 1s
10750K .......... .......... .......... .......... .......... 18% 90.1M 1s
10800K .......... .......... .......... .......... .......... 18% 106M 1s
10850K .......... .......... .......... .......... .......... 18% 108M 1s
10900K .......... .......... .......... .......... .......... 18% 108M 1s
10950K .......... .......... .......... .......... .......... 18% 98.7M 1s
11000K .......... .......... .......... .......... .......... 18% 103M 1s
11050K .......... .......... .......... .......... .......... 18% 97.0M 1s
11100K .......... .......... .......... .......... .......... 18% 105M 1s
11150K .......... .......... .......... .......... .......... 18% 101M 1s
11200K .......... .......... .......... .......... .......... 18% 83.4M 1s
11250K .......... .......... .......... .......... .......... 18% 106M 1s
11300K .......... .......... .......... .......... .......... 19% 125M 1s
11350K .......... .......... .......... .......... .......... 19% 107M 1s
11400K .......... .......... .......... .......... .......... 19% 103M 1s
11450K .......... .......... .......... .......... .......... 19% 98.8M 1s
11500K .......... .......... .......... .......... .......... 19% 107M 1s
11550K .......... .......... .......... .......... .......... 19% 117M 1s
11600K .......... .......... .......... .......... .......... 19% 120M 1s
11650K .......... .......... .......... .......... .......... 19% 125M 1s
11700K .......... .......... .......... .......... .......... 19% 114M 1s
11750K .......... .......... .......... .......... .......... 19% 104M 1s
11800K .......... .......... .......... .......... .......... 19% 115M 1s
11850K .......... .......... .......... .......... .......... 19% 120M 1s
11900K .......... .......... .......... .......... .......... 20% 118M 1s
11950K .......... .......... .......... .......... .......... 20% 113M 1s
12000K .......... .......... .......... .......... .......... 20% 119M 1s
12050K .......... .......... .......... .......... .......... 20% 115M 1s
12100K .......... .......... .......... .......... .......... 20% 106M 1s
12150K .......... .......... .......... .......... .......... 20% 105M 1s
12200K .......... .......... .......... .......... .......... 20% 122M 1s
12250K .......... .......... .......... .......... .......... 20% 124M 1s
12300K .......... .......... .......... .......... .......... 20% 120M 1s
12350K .......... .......... .......... .......... .......... 20% 116M 1s
12400K .......... .......... .......... .......... .......... 20% 115M 1s
12450K .......... .......... .......... .......... .......... 21% 107M 1s
12500K .......... .......... .......... .......... .......... 21% 119M 1s
12550K .......... .......... .......... .......... .......... 21% 79.7M 1s
12600K .......... .......... .......... .......... .......... 21% 159M 1s
12650K .......... .......... .......... .......... .......... 21% 142M 1s
12700K .......... .......... .......... .......... .......... 21% 132M 1s
12750K .......... .......... .......... .......... .......... 21% 80.5M 1s
12800K .......... .......... .......... .......... .......... 21% 121M 1s
12850K .......... .......... .......... .......... .......... 21% 112M 1s
12900K .......... .......... .......... .......... .......... 21% 119M 1s
12950K .......... .......... .......... .......... .......... 21% 92.5M 1s
13000K .......... .......... .......... .......... .......... 21% 100M 1s
13050K .......... .......... .......... .......... .......... 22% 110M 1s
13100K .......... .......... .......... .......... .......... 22% 118M 1s
13150K .......... .......... .......... .......... .......... 22% 112M 1s
13200K .......... .......... .......... .......... .......... 22% 5.60M 1s
13250K .......... .......... .......... .......... .......... 22% 114M 1s
13300K .......... .......... .......... .......... .......... 22% 100M 1s
13350K .......... .......... .......... .......... .......... 22% 102M 1s
13400K .......... .......... .......... .......... .......... 22% 105M 1s
13450K .......... .......... .......... .......... .......... 22% 127M 1s
13500K .......... .......... .......... .......... .......... 22% 93.5M 1s
13550K .......... .......... .......... .......... .......... 22% 88.1M 1s
13600K .......... .......... .......... .......... .......... 22% 120M 1s
13650K .......... .......... .......... .......... .......... 23% 128M 1s
13700K .......... .......... .......... .......... .......... 23% 101M 1s
13750K .......... .......... .......... .......... .......... 23% 109M 1s
13800K .......... .......... .......... .......... .......... 23% 150M 1s
13850K .......... .......... .......... .......... .......... 23% 98.1M 1s
13900K .......... .......... .......... .......... .......... 23% 119M 1s
13950K .......... .......... .......... .......... .......... 23% 90.2M 1s
14000K .......... .......... .......... .......... .......... 23% 125M 1s
14050K .......... .......... .......... .......... .......... 23% 119M 1s
14100K .......... .......... .......... .......... .......... 23% 119M 1s
14150K .......... .......... .......... .......... .......... 23% 104M 1s
14200K .......... .......... .......... .......... .......... 23% 127M 1s
14250K .......... .......... .......... .......... .......... 24% 117M 1s
14300K .......... .......... .......... .......... .......... 24% 123M 1s
14350K .......... .......... .......... .......... .......... 24% 89.7M 1s
14400K .......... .......... .......... .......... .......... 24% 113M 1s
14450K .......... .......... .......... .......... .......... 24% 107M 1s
14500K .......... .......... .......... .......... .......... 24% 125M 1s
14550K .......... .......... .......... .......... .......... 24% 97.4M 1s
14600K .......... .......... .......... .......... .......... 24% 111M 1s
14650K .......... .......... .......... .......... .......... 24% 123M 1s
14700K .......... .......... .......... .......... .......... 24% 114M 1s
14750K .......... .......... .......... .......... .......... 24% 104M 1s
14800K .......... .......... .......... .......... .......... 24% 108M 1s
14850K .......... .......... .......... .......... .......... 25% 114M 1s
14900K .......... .......... .......... .......... .......... 25% 114M 1s
14950K .......... .......... .......... .......... .......... 25% 101M 1s
15000K .......... .......... .......... .......... .......... 25% 112M 1s
15050K .......... .......... .......... .......... .......... 25% 111M 1s
15100K .......... .......... .......... .......... .......... 25% 119M 1s
15150K .......... .......... .......... .......... .......... 25% 105M 1s
15200K .......... .......... .......... .......... .......... 25% 116M 1s
15250K .......... .......... .......... .......... .......... 25% 114M 1s
15300K .......... .......... .......... .......... .......... 25% 127M 1s
15350K .......... .......... .......... .......... .......... 25% 104M 1s
15400K .......... .......... .......... .......... .......... 25% 125M 1s
15450K .......... .......... .......... .......... .......... 26% 124M 1s
15500K .......... .......... .......... .......... .......... 26% 105M 1s
15550K .......... .......... .......... .......... .......... 26% 104M 1s
15600K .......... .......... .......... .......... .......... 26% 106M 1s
15650K .......... .......... .......... .......... .......... 26% 120M 1s
15700K .......... .......... .......... .......... .......... 26% 111M 1s
15750K .......... .......... .......... .......... .......... 26% 95.2M 1s
15800K .......... .......... .......... .......... .......... 26% 126M 1s
15850K .......... .......... .......... .......... .......... 26% 108M 1s
15900K .......... .......... .......... .......... .......... 26% 119M 1s
15950K .......... .......... .......... .......... .......... 26% 89.3M 1s
16000K .......... .......... .......... .......... .......... 26% 111M 1s
16050K .......... .......... .......... .......... .......... 27% 122M 1s
16100K .......... .......... .......... .......... .......... 27% 114M 1s
16150K .......... .......... .......... .......... .......... 27% 93.8M 1s
*** WARNING: skipped 41040 bytes of output ***
43200K .......... .......... .......... .......... .......... 72% 124M 0s
43250K .......... .......... .......... .......... .......... 72% 107M 0s
43300K .......... .......... .......... .......... .......... 72% 11.8M 0s
43350K .......... .......... .......... .......... .......... 72% 52.4M 0s
43400K .......... .......... .......... .......... .......... 73% 66.4M 0s
43450K .......... .......... .......... .......... .......... 73% 68.0M 0s
43500K .......... .......... .......... .......... .......... 73% 115M 0s
43550K .......... .......... .......... .......... .......... 73% 109M 0s
43600K .......... .......... .......... .......... .......... 73% 111M 0s
43650K .......... .......... .......... .......... .......... 73% 117M 0s
43700K .......... .......... .......... .......... .......... 73% 55.8M 0s
43750K .......... .......... .......... .......... .......... 73% 101M 0s
43800K .......... .......... .......... .......... .......... 73% 113M 0s
43850K .......... .......... .......... .......... .......... 73% 125M 0s
43900K .......... .......... .......... .......... .......... 73% 118M 0s
43950K .......... .......... .......... .......... .......... 73% 100M 0s
44000K .......... .......... .......... .......... .......... 74% 129M 0s
44050K .......... .......... .......... .......... .......... 74% 124M 0s
44100K .......... .......... .......... .......... .......... 74% 50.6M 0s
44150K .......... .......... .......... .......... .......... 74% 50.6M 0s
44200K .......... .......... .......... .......... .......... 74% 66.3M 0s
44250K .......... .......... .......... .......... .......... 74% 68.9M 0s
44300K .......... .......... .......... .......... .......... 74% 66.5M 0s
44350K .......... .......... .......... .......... .......... 74% 61.2M 0s
44400K .......... .......... .......... .......... .......... 74% 125M 0s
44450K .......... .......... .......... .......... .......... 74% 132M 0s
44500K .......... .......... .......... .......... .......... 74% 108M 0s
44550K .......... .......... .......... .......... .......... 74% 100M 0s
44600K .......... .......... .......... .......... .......... 75% 118M 0s
44650K .......... .......... .......... .......... .......... 75% 119M 0s
44700K .......... .......... .......... .......... .......... 75% 96.6M 0s
44750K .......... .......... .......... .......... .......... 75% 75.6M 0s
44800K .......... .......... .......... .......... .......... 75% 84.7M 0s
44850K .......... .......... .......... .......... .......... 75% 93.7M 0s
44900K .......... .......... .......... .......... .......... 75% 26.2M 0s
44950K .......... .......... .......... .......... .......... 75% 51.1M 0s
45000K .......... .......... .......... .......... .......... 75% 57.5M 0s
45050K .......... .......... .......... .......... .......... 75% 52.9M 0s
45100K .......... .......... .......... .......... .......... 75% 54.3M 0s
45150K .......... .......... .......... .......... .......... 75% 55.2M 0s
45200K .......... .......... .......... .......... .......... 76% 66.7M 0s
45250K .......... .......... .......... .......... .......... 76% 57.9M 0s
45300K .......... .......... .......... .......... .......... 76% 63.0M 0s
45350K .......... .......... .......... .......... .......... 76% 49.8M 0s
45400K .......... .......... .......... .......... .......... 76% 64.7M 0s
45450K .......... .......... .......... .......... .......... 76% 67.4M 0s
45500K .......... .......... .......... .......... .......... 76% 66.6M 0s
45550K .......... .......... .......... .......... .......... 76% 71.5M 0s
45600K .......... .......... .......... .......... .......... 76% 127M 0s
45650K .......... .......... .......... .......... .......... 76% 118M 0s
45700K .......... .......... .......... .......... .......... 76% 119M 0s
45750K .......... .......... .......... .......... .......... 76% 110M 0s
45800K .......... .......... .......... .......... .......... 77% 56.2M 0s
45850K .......... .......... .......... .......... .......... 77% 58.3M 0s
45900K .......... .......... .......... .......... .......... 77% 62.2M 0s
45950K .......... .......... .......... .......... .......... 77% 49.2M 0s
46000K .......... .......... .......... .......... .......... 77% 44.7M 0s
46050K .......... .......... .......... .......... .......... 77% 104M 0s
46100K .......... .......... .......... .......... .......... 77% 123M 0s
46150K .......... .......... .......... .......... .......... 77% 95.9M 0s
46200K .......... .......... .......... .......... .......... 77% 94.1M 0s
46250K .......... .......... .......... .......... .......... 77% 105M 0s
46300K .......... .......... .......... .......... .......... 77% 114M 0s
46350K .......... .......... .......... .......... .......... 77% 101M 0s
46400K .......... .......... .......... .......... .......... 78% 115M 0s
46450K .......... .......... .......... .......... .......... 78% 59.9M 0s
46500K .......... .......... .......... .......... .......... 78% 66.9M 0s
46550K .......... .......... .......... .......... .......... 78% 61.3M 0s
46600K .......... .......... .......... .......... .......... 78% 119M 0s
46650K .......... .......... .......... .......... .......... 78% 117M 0s
46700K .......... .......... .......... .......... .......... 78% 118M 0s
46750K .......... .......... .......... .......... .......... 78% 73.3M 0s
46800K .......... .......... .......... .......... .......... 78% 43.7M 0s
46850K .......... .......... .......... .......... .......... 78% 65.8M 0s
46900K .......... .......... .......... .......... .......... 78% 69.9M 0s
46950K .......... .......... .......... .......... .......... 78% 91.2M 0s
47000K .......... .......... .......... .......... .......... 79% 115M 0s
47050K .......... .......... .......... .......... .......... 79% 59.8M 0s
47100K .......... .......... .......... .......... .......... 79% 67.6M 0s
47150K .......... .......... .......... .......... .......... 79% 109M 0s
47200K .......... .......... .......... .......... .......... 79% 123M 0s
47250K .......... .......... .......... .......... .......... 79% 116M 0s
47300K .......... .......... .......... .......... .......... 79% 90.3M 0s
47350K .......... .......... .......... .......... .......... 79% 97.4M 0s
47400K .......... .......... .......... .......... .......... 79% 84.2M 0s
47450K .......... .......... .......... .......... .......... 79% 44.8M 0s
47500K .......... .......... .......... .......... .......... 79% 63.1M 0s
47550K .......... .......... .......... .......... .......... 79% 73.1M 0s
47600K .......... .......... .......... .......... .......... 80% 114M 0s
47650K .......... .......... .......... .......... .......... 80% 125M 0s
47700K .......... .......... .......... .......... .......... 80% 98.4M 0s
47750K .......... .......... .......... .......... .......... 80% 52.6M 0s
47800K .......... .......... .......... .......... .......... 80% 108M 0s
47850K .......... .......... .......... .......... .......... 80% 123M 0s
47900K .......... .......... .......... .......... .......... 80% 120M 0s
47950K .......... .......... .......... .......... .......... 80% 106M 0s
48000K .......... .......... .......... .......... .......... 80% 119M 0s
48050K .......... .......... .......... .......... .......... 80% 122M 0s
48100K .......... .......... .......... .......... .......... 80% 56.2M 0s
48150K .......... .......... .......... .......... .......... 80% 47.5M 0s
48200K .......... .......... .......... .......... .......... 81% 55.7M 0s
48250K .......... .......... .......... .......... .......... 81% 65.2M 0s
48300K .......... .......... .......... .......... .......... 81% 84.8M 0s
48350K .......... .......... .......... .......... .......... 81% 11.5M 0s
48400K .......... .......... .......... .......... .......... 81% 17.6M 0s
48450K .......... .......... .......... .......... .......... 81% 48.7M 0s
48500K .......... .......... .......... .......... .......... 81% 77.4M 0s
48550K .......... .......... .......... .......... .......... 81% 5.73M 0s
48600K .......... .......... .......... .......... .......... 81% 19.9M 0s
48650K .......... .......... .......... .......... .......... 81% 15.9M 0s
48700K .......... .......... .......... .......... .......... 81% 22.3M 0s
48750K .......... .......... .......... .......... .......... 81% 19.3M 0s
48800K .......... .......... .......... .......... .......... 82% 21.3M 0s
48850K .......... .......... .......... .......... .......... 82% 21.8M 0s
48900K .......... .......... .......... .......... .......... 82% 19.3M 0s
48950K .......... .......... .......... .......... .......... 82% 15.5M 0s
49000K .......... .......... .......... .......... .......... 82% 12.9M 0s
49050K .......... .......... .......... .......... .......... 82% 40.4M 0s
49100K .......... .......... .......... .......... .......... 82% 41.2M 0s
49150K .......... .......... .......... .......... .......... 82% 35.8M 0s
49200K .......... .......... .......... .......... .......... 82% 40.8M 0s
49250K .......... .......... .......... .......... .......... 82% 41.4M 0s
49300K .......... .......... .......... .......... .......... 82% 41.4M 0s
49350K .......... .......... .......... .......... .......... 82% 37.9M 0s
49400K .......... .......... .......... .......... .......... 83% 33.0M 0s
49450K .......... .......... .......... .......... .......... 83% 34.4M 0s
49500K .......... .......... .......... .......... .......... 83% 43.2M 0s
49550K .......... .......... .......... .......... .......... 83% 37.7M 0s
49600K .......... .......... .......... .......... .......... 83% 42.6M 0s
49650K .......... .......... .......... .......... .......... 83% 42.5M 0s
49700K .......... .......... .......... .......... .......... 83% 42.5M 0s
49750K .......... .......... .......... .......... .......... 83% 41.0M 0s
49800K .......... .......... .......... .......... .......... 83% 110M 0s
49850K .......... .......... .......... .......... .......... 83% 80.4M 0s
49900K .......... .......... .......... .......... .......... 83% 80.0M 0s
49950K .......... .......... .......... .......... .......... 84% 116M 0s
50000K .......... .......... .......... .......... .......... 84% 86.4M 0s
50050K .......... .......... .......... .......... .......... 84% 92.7M 0s
50100K .......... .......... .......... .......... .......... 84% 91.8M 0s
50150K .......... .......... .......... .......... .......... 84% 88.4M 0s
50200K .......... .......... .......... .......... .......... 84% 90.7M 0s
50250K .......... .......... .......... .......... .......... 84% 90.6M 0s
50300K .......... .......... .......... .......... .......... 84% 101M 0s
50350K .......... .......... .......... .......... .......... 84% 115M 0s
50400K .......... .......... .......... .......... .......... 84% 215M 0s
50450K .......... .......... .......... .......... .......... 84% 248M 0s
50500K .......... .......... .......... .......... .......... 84% 248M 0s
50550K .......... .......... .......... .......... .......... 85% 208M 0s
50600K .......... .......... .......... .......... .......... 85% 245M 0s
50650K .......... .......... .......... .......... .......... 85% 190M 0s
50700K .......... .......... .......... .......... .......... 85% 246M 0s
50750K .......... .......... .......... .......... .......... 85% 225M 0s
50800K .......... .......... .......... .......... .......... 85% 249M 0s
50850K .......... .......... .......... .......... .......... 85% 227M 0s
50900K .......... .......... .......... .......... .......... 85% 74.6M 0s
50950K .......... .......... .......... .......... .......... 85% 78.2M 0s
51000K .......... .......... .......... .......... .......... 85% 127M 0s
51050K .......... .......... .......... .......... .......... 85% 127M 0s
51100K .......... .......... .......... .......... .......... 85% 132M 0s
51150K .......... .......... .......... .......... .......... 86% 59.4M 0s
51200K .......... .......... .......... .......... .......... 86% 79.7M 0s
51250K .......... .......... .......... .......... .......... 86% 11.7M 0s
51300K .......... .......... .......... .......... .......... 86% 92.7M 0s
51350K .......... .......... .......... .......... .......... 86% 80.8M 0s
51400K .......... .......... .......... .......... .......... 86% 93.1M 0s
51450K .......... .......... .......... .......... .......... 86% 112M 0s
51500K .......... .......... .......... .......... .......... 86% 68.9M 0s
51550K .......... .......... .......... .......... .......... 86% 8.58M 0s
51600K .......... .......... .......... .......... .......... 86% 40.9M 0s
51650K .......... .......... .......... .......... .......... 86% 35.9M 0s
51700K .......... .......... .......... .......... .......... 86% 26.9M 0s
51750K .......... .......... .......... .......... .......... 87% 33.7M 0s
51800K .......... .......... .......... .......... .......... 87% 46.3M 0s
51850K .......... .......... .......... .......... .......... 87% 44.5M 0s
51900K .......... .......... .......... .......... .......... 87% 29.9M 0s
51950K .......... .......... .......... .......... .......... 87% 23.4M 0s
52000K .......... .......... .......... .......... .......... 87% 20.6M 0s
52050K .......... .......... .......... .......... .......... 87% 51.7M 0s
52100K .......... .......... .......... .......... .......... 87% 43.6M 0s
52150K .......... .......... .......... .......... .......... 87% 36.5M 0s
52200K .......... .......... .......... .......... .......... 87% 43.9M 0s
52250K .......... .......... .......... .......... .......... 87% 28.6M 0s
52300K .......... .......... .......... .......... .......... 87% 38.1M 0s
52350K .......... .......... .......... .......... .......... 88% 39.0M 0s
52400K .......... .......... .......... .......... .......... 88% 40.8M 0s
52450K .......... .......... .......... .......... .......... 88% 41.0M 0s
52500K .......... .......... .......... .......... .......... 88% 44.6M 0s
52550K .......... .......... .......... .......... .......... 88% 39.1M 0s
52600K .......... .......... .......... .......... .......... 88% 44.9M 0s
52650K .......... .......... .......... .......... .......... 88% 29.4M 0s
52700K .......... .......... .......... .......... .......... 88% 35.3M 0s
52750K .......... .......... .......... .......... .......... 88% 33.0M 0s
52800K .......... .......... .......... .......... .......... 88% 45.3M 0s
52850K .......... .......... .......... .......... .......... 88% 120M 0s
52900K .......... .......... .......... .......... .......... 88% 120M 0s
52950K .......... .......... .......... .......... .......... 89% 91.1M 0s
53000K .......... .......... .......... .......... .......... 89% 74.8M 0s
53050K .......... .......... .......... .......... .......... 89% 123M 0s
53100K .......... .......... .......... .......... .......... 89% 100M 0s
53150K .......... .......... .......... .......... .......... 89% 79.9M 0s
53200K .......... .......... .......... .......... .......... 89% 150M 0s
53250K .......... .......... .......... .......... .......... 89% 165M 0s
53300K .......... .......... .......... .......... .......... 89% 159M 0s
53350K .......... .......... .......... .......... .......... 89% 65.6M 0s
53400K .......... .......... .......... .......... .......... 89% 127M 0s
53450K .......... .......... .......... .......... .......... 89% 38.7M 0s
53500K .......... .......... .......... .......... .......... 89% 35.9M 0s
53550K .......... .......... .......... .......... .......... 90% 40.1M 0s
53600K .......... .......... .......... .......... .......... 90% 41.4M 0s
53650K .......... .......... .......... .......... .......... 90% 32.2M 0s
53700K .......... .......... .......... .......... .......... 90% 32.0M 0s
53750K .......... .......... .......... .......... .......... 90% 25.5M 0s
53800K .......... .......... .......... .......... .......... 90% 27.6M 0s
53850K .......... .......... .......... .......... .......... 90% 23.7M 0s
53900K .......... .......... .......... .......... .......... 90% 114M 0s
53950K .......... .......... .......... .......... .......... 90% 111M 0s
54000K .......... .......... .......... .......... .......... 90% 124M 0s
54050K .......... .......... .......... .......... .......... 90% 122M 0s
54100K .......... .......... .......... .......... .......... 90% 113M 0s
54150K .......... .......... .......... .......... .......... 91% 78.2M 0s
54200K .......... .......... .......... .......... .......... 91% 91.3M 0s
54250K .......... .......... .......... .......... .......... 91% 87.3M 0s
54300K .......... .......... .......... .......... .......... 91% 90.7M 0s
54350K .......... .......... .......... .......... .......... 91% 96.7M 0s
54400K .......... .......... .......... .......... .......... 91% 147M 0s
54450K .......... .......... .......... .......... .......... 91% 242M 0s
54500K .......... .......... .......... .......... .......... 91% 205M 0s
54550K .......... .......... .......... .......... .......... 91% 155M 0s
54600K .......... .......... .......... .......... .......... 91% 83.4M 0s
54650K .......... .......... .......... .......... .......... 91% 98.4M 0s
54700K .......... .......... .......... .......... .......... 91% 122M 0s
54750K .......... .......... .......... .......... .......... 92% 112M 0s
54800K .......... .......... .......... .......... .......... 92% 154M 0s
54850K .......... .......... .......... .......... .......... 92% 104M 0s
54900K .......... .......... .......... .......... .......... 92% 127M 0s
54950K .......... .......... .......... .......... .......... 92% 134M 0s
55000K .......... .......... .......... .......... .......... 92% 152M 0s
55050K .......... .......... .......... .......... .......... 92% 144M 0s
55100K .......... .......... .......... .......... .......... 92% 148M 0s
55150K .......... .......... .......... .......... .......... 92% 80.0M 0s
55200K .......... .......... .......... .......... .......... 92% 117M 0s
55250K .......... .......... .......... .......... .......... 92% 118M 0s
55300K .......... .......... .......... .......... .......... 92% 95.4M 0s
55350K .......... .......... .......... .......... .......... 93% 105M 0s
55400K .......... .......... .......... .......... .......... 93% 168M 0s
55450K .......... .......... .......... .......... .......... 93% 148M 0s
55500K .......... .......... .......... .......... .......... 93% 183M 0s
55550K .......... .......... .......... .......... .......... 93% 49.5M 0s
55600K .......... .......... .......... .......... .......... 93% 76.8M 0s
55650K .......... .......... .......... .......... .......... 93% 80.0M 0s
55700K .......... .......... .......... .......... .......... 93% 80.6M 0s
55750K .......... .......... .......... .......... .......... 93% 66.5M 0s
55800K .......... .......... .......... .......... .......... 93% 82.5M 0s
55850K .......... .......... .......... .......... .......... 93% 79.4M 0s
55900K .......... .......... .......... .......... .......... 94% 81.0M 0s
55950K .......... .......... .......... .......... .......... 94% 87.4M 0s
56000K .......... .......... .......... .......... .......... 94% 123M 0s
56050K .......... .......... .......... .......... .......... 94% 124M 0s
56100K .......... .......... .......... .......... .......... 94% 235M 0s
56150K .......... .......... .......... .......... .......... 94% 95.1M 0s
56200K .......... .......... .......... .......... .......... 94% 127M 0s
56250K .......... .......... .......... .......... .......... 94% 109M 0s
56300K .......... .......... .......... .......... .......... 94% 83.7M 0s
56350K .......... .......... .......... .......... .......... 94% 89.5M 0s
56400K .......... .......... .......... .......... .......... 94% 92.3M 0s
56450K .......... .......... .......... .......... .......... 94% 86.6M 0s
56500K .......... .......... .......... .......... .......... 95% 90.9M 0s
56550K .......... .......... .......... .......... .......... 95% 103M 0s
56600K .......... .......... .......... .......... .......... 95% 148M 0s
56650K .......... .......... .......... .......... .......... 95% 237M 0s
56700K .......... .......... .......... .......... .......... 95% 107M 0s
56750K .......... .......... .......... .......... .......... 95% 71.7M 0s
56800K .......... .......... .......... .......... .......... 95% 96.6M 0s
56850K .......... .......... .......... .......... .......... 95% 225M 0s
56900K .......... .......... .......... .......... .......... 95% 130M 0s
56950K .......... .......... .......... .......... .......... 95% 209M 0s
57000K .......... .......... .......... .......... .......... 95% 133M 0s
57050K .......... .......... .......... .......... .......... 95% 230M 0s
57100K .......... .......... .......... .......... .......... 96% 130M 0s
57150K .......... .......... .......... .......... .......... 96% 214M 0s
57200K .......... .......... .......... .......... .......... 96% 142M 0s
57250K .......... .......... .......... .......... .......... 96% 120M 0s
57300K .......... .......... .......... .......... .......... 96% 100M 0s
57350K .......... .......... .......... .......... .......... 96% 77.4M 0s
57400K .......... .......... .......... .......... .......... 96% 125M 0s
57450K .......... .......... .......... .......... .......... 96% 130M 0s
57500K .......... .......... .......... .......... .......... 96% 83.9M 0s
57550K .......... .......... .......... .......... .......... 96% 87.4M 0s
57600K .......... .......... .......... .......... .......... 96% 83.9M 0s
57650K .......... .......... .......... .......... .......... 96% 105M 0s
57700K .......... .......... .......... .......... .......... 97% 82.0M 0s
57750K .......... .......... .......... .......... .......... 97% 138M 0s
57800K .......... .......... .......... .......... .......... 97% 123M 0s
57850K .......... .......... .......... .......... .......... 97% 213M 0s
57900K .......... .......... .......... .......... .......... 97% 202M 0s
57950K .......... .......... .......... .......... .......... 97% 154M 0s
58000K .......... .......... .......... .......... .......... 97% 125M 0s
58050K .......... .......... .......... .......... .......... 97% 219M 0s
58100K .......... .......... .......... .......... .......... 97% 246M 0s
58150K .......... .......... .......... .......... .......... 97% 124M 0s
58200K .......... .......... .......... .......... .......... 97% 133M 0s
58250K .......... .......... .......... .......... .......... 97% 120M 0s
58300K .......... .......... .......... .......... .......... 98% 121M 0s
58350K .......... .......... .......... .......... .......... 98% 185M 0s
58400K .......... .......... .......... .......... .......... 98% 172M 0s
58450K .......... .......... .......... .......... .......... 98% 170M 0s
58500K .......... .......... .......... .......... .......... 98% 251M 0s
58550K .......... .......... .......... .......... .......... 98% 119M 0s
58600K .......... .......... .......... .......... .......... 98% 20.1M 0s
58650K .......... .......... .......... .......... .......... 98% 77.2M 0s
58700K .......... .......... .......... .......... .......... 98% 83.0M 0s
58750K .......... .......... .......... .......... .......... 98% 79.8M 0s
58800K .......... .......... .......... .......... .......... 98% 97.9M 0s
58850K .......... .......... .......... .......... .......... 98% 109M 0s
58900K .......... .......... .......... .......... .......... 99% 89.6M 0s
58950K .......... .......... .......... .......... .......... 99% 93.4M 0s
59000K .......... .......... .......... .......... .......... 99% 83.1M 0s
59050K .......... .......... .......... .......... .......... 99% 100M 0s
59100K .......... .......... .......... .......... .......... 99% 104M 0s
59150K .......... .......... .......... .......... .......... 99% 89.6M 0s
59200K .......... .......... .......... .......... .......... 99% 106M 0s
59250K .......... .......... .......... .......... .......... 99% 112M 0s
59300K .......... .......... .......... .......... .......... 99% 101M 0s
59350K .......... .......... .......... .......... .......... 99% 104M 0s
59400K .......... .......... .......... .......... .......... 99% 24.2M 0s
59450K .......... .......... .......... .......... .......... 99% 69.1M 0s
59500K .......... ......... 100% 162M=1.0s
2022-02-01 14:21:20 (58.9 MB/s) - ‘all.tsv’ saved [60947802/60947802]
pwd
/databricks/driver
dbutils.fs.mkdirs("dbfs:/datasets/magellan") //need not be done again!
res55: Boolean = true
dbutils.fs.cp("file:/databricks/driver/all.tsv", "dbfs:/datasets/magellan/") // load into dbfs
res56: Boolean = true
display(dbutils.fs.ls("dbfs:/datasets/magellan/"))
| path | name | size |
|---|---|---|
| dbfs:/datasets/magellan/SFNbhd/ | SFNbhd/ | 0.0 |
| dbfs:/datasets/magellan/all.tsv | all.tsv | 6.0947802e7 |
wget http://www.lamastex.org/courses/ScalableDataScience/2016/datasets/magellan/UberSF/planning_neighborhoods.zip
--2022-02-01 14:21:24-- http://www.lamastex.org/courses/ScalableDataScience/2016/datasets/magellan/UberSF/planning_neighborhoods.zip
Resolving www.lamastex.org (www.lamastex.org)... 166.62.28.100
Connecting to www.lamastex.org (www.lamastex.org)|166.62.28.100|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 163771 (160K) [application/zip]
Saving to: ‘planning_neighborhoods.zip’
0K .......... .......... .......... .......... .......... 31% 36.7M 0s
50K .......... .......... .......... .......... .......... 62% 294K 0s
100K .......... .......... .......... .......... .......... 93% 296K 0s
150K ......... 100% 59.1K=0.5s
2022-02-01 14:21:25 (315 KB/s) - ‘planning_neighborhoods.zip’ saved [163771/163771]
unzip planning_neighborhoods.zip
Archive: planning_neighborhoods.zip
inflating: planning_neighborhoods.dbf
inflating: planning_neighborhoods.shx
inflating: planning_neighborhoods.shp.xml
inflating: planning_neighborhoods.shp
inflating: planning_neighborhoods.sbx
inflating: planning_neighborhoods.sbn
inflating: planning_neighborhoods.prj
ls -al
total 59968
drwxr-xr-x 1 root root 4096 Feb 1 14:21 .
drwxr-xr-x 1 root root 4096 Feb 1 13:54 ..
-rw-r--r-- 1 root root 60947802 Feb 1 14:21 all.tsv
drwxr-xr-x 2 root root 4096 Jan 1 1970 conf
-rw-r--r-- 1 root root 704 Feb 1 13:54 derby.log
drwxr-xr-x 3 root root 4096 Feb 1 13:54 eventlogs
drwxr-xr-x 2 root root 4096 Feb 1 14:15 ganglia
drwxr-xr-x 2 root root 4096 Feb 1 14:00 logs
-rw-r--r-- 1 root root 1028 Jan 20 2012 planning_neighborhoods.dbf
-rw-r--r-- 1 root root 567 Jan 20 2012 planning_neighborhoods.prj
-rw-r--r-- 1 root root 516 Jan 20 2012 planning_neighborhoods.sbn
-rw-r--r-- 1 root root 164 Jan 20 2012 planning_neighborhoods.sbx
-rw-r--r-- 1 root root 214576 Jan 20 2012 planning_neighborhoods.shp
-rw-r--r-- 1 root root 21958 Jan 20 2012 planning_neighborhoods.shp.xml
-rw-r--r-- 1 root root 396 Jan 20 2012 planning_neighborhoods.shx
-rw-r--r-- 1 root root 163771 Nov 9 2015 planning_neighborhoods.zip
mv planning_neighborhoods.zip orig_planning_neighborhoods.zip
Let's prepare the files in a local directory named SFNbhd
- make a directory called
SFNbhdusing the commandmkdir SFNbhd - after making the directory specified by
&&move the files starting withplanning_neiin to the directory we madeSFNbhdby:mv planning_nei* SFNbhd
- list the contents of the current directory using
ls - finally list the contents of the directory
SFNbhdinside current directory usingls -al SFNbhd
mkdir SFNbhd && mv planning_nei* SFNbhd && ls
ls -al SFNbhd
SFNbhd
all.tsv
conf
derby.log
eventlogs
ganglia
logs
orig_planning_neighborhoods.zip
total 264
drwxr-xr-x 2 root root 4096 Feb 1 14:21 .
drwxr-xr-x 1 root root 4096 Feb 1 14:21 ..
-rw-r--r-- 1 root root 1028 Jan 20 2012 planning_neighborhoods.dbf
-rw-r--r-- 1 root root 567 Jan 20 2012 planning_neighborhoods.prj
-rw-r--r-- 1 root root 516 Jan 20 2012 planning_neighborhoods.sbn
-rw-r--r-- 1 root root 164 Jan 20 2012 planning_neighborhoods.sbx
-rw-r--r-- 1 root root 214576 Jan 20 2012 planning_neighborhoods.shp
-rw-r--r-- 1 root root 21958 Jan 20 2012 planning_neighborhoods.shp.xml
-rw-r--r-- 1 root root 396 Jan 20 2012 planning_neighborhoods.shx
dbutils.fs.mkdirs("dbfs:/datasets/magellan/SFNbhd") //make the directory in dbfs - need not be done again!
res58: Boolean = true
// just copy each file - done for pedantic reasons; we can do more sophisticated dbfs loads for large shape files
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.dbf", "dbfs:/datasets/magellan/SFNbhd/")
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.prj", "dbfs:/datasets/magellan/SFNbhd/")
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.sbn", "dbfs:/datasets/magellan/SFNbhd/")
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.sbx", "dbfs:/datasets/magellan/SFNbhd/")
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.shp", "dbfs:/datasets/magellan/SFNbhd/")
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.shp.xml", "dbfs:/datasets/magellan/SFNbhd/")
dbutils.fs.cp("file:/databricks/driver/SFNbhd/planning_neighborhoods.shx", "dbfs:/datasets/magellan/SFNbhd/")
res59: Boolean = true
display(dbutils.fs.ls("dbfs:/datasets/magellan/SFNbhd/"))
| path | name | size |
|---|---|---|
| dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.dbf | planning_neighborhoods.dbf | 1028.0 |
| dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.prj | planning_neighborhoods.prj | 567.0 |
| dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.sbn | planning_neighborhoods.sbn | 516.0 |
| dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.sbx | planning_neighborhoods.sbx | 164.0 |
| dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.shp | planning_neighborhoods.shp | 214576.0 |
| dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.shp.xml | planning_neighborhoods.shp.xml | 21958.0 |
| dbfs:/datasets/magellan/SFNbhd/planning_neighborhoods.shx | planning_neighborhoods.shx | 396.0 |
By Marina Toger
TODO: Raaz - re-liven for 2021...
OSM
1.We define an area of interest and find coordinates of its boundary, AKA "bounding box". To do this go to https://www.openstreetmap.org and zoom roughly into the desired area. Then one can see the coordinates of the bounding box by using the export option.
2.To ingest data from OSM we use wget, in the following format:
wget -O MyFileName.osm "https://api.openstreetmap.org/api/0.6/map?bbox=l,b,r,t"
-
MyFileName.osm- give some informative file name -
l = longitude of the LEFT boundary of the bounding box
-
b = lattitude of the BOTTOM boundary of the bounding box
-
r = longitude of the RIGHT boundary of the bounding box
-
t = lattitude of the TOP boundary of the bounding box
For instance if you know the bounding box, do:
-
TinyUppsalaCentrumWgot.osm- Tiny area in Uppsala Centrum -
l =
17.63514 -
b =
59.85739 -
r =
17.64154 -
t =
59.86011
wget -O TinyUppsalaCentrumWgot.osm "https://api.openstreetmap.org/api/0.6/map?bbox=17.63514,59.85739,17.64154,59.86011"
//Imports
import magellan._
import magellan._
ls
conf
derby.log
eventlogs
ganglia
logs
wget -O UppsalaCentrumWgot.osm "https://api.openstreetmap.org/api/0.6/map?bbox=17.6244,59.8464,17.6661,59.8643"
--2022-02-01 14:26:09-- https://api.openstreetmap.org/api/0.6/map?bbox=17.6244,59.8464,17.6661,59.8643
Resolving api.openstreetmap.org (api.openstreetmap.org)... 130.117.76.11, 130.117.76.12, 130.117.76.13, ...
Connecting to api.openstreetmap.org (api.openstreetmap.org)|130.117.76.11|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/xml]
Saving to: ‘UppsalaCentrumWgot.osm’
0K .......... .......... .......... .......... .......... 1.23M
50K .......... .......... .......... .......... .......... 2.79M
100K .......... .......... .......... .......... .......... 2.88M
150K .......... .......... .......... .......... .......... 2.72M
200K .......... .......... .......... .......... .......... 3.79M
250K .......... .......... .......... .......... .......... 3.11M
300K .......... .......... .......... .......... .......... 3.30M
350K .......... .......... .......... .......... .......... 3.02M
400K .......... .......... .......... .......... .......... 3.75M
450K .......... .......... .......... .......... .......... 4.69M
500K .......... .......... .......... .......... .......... 3.50M
550K .......... .......... .......... .......... .......... 3.64M
600K .......... .......... .......... .......... .......... 4.86M
650K .......... .......... .......... .......... .......... 4.01M
700K .......... .......... .......... .......... .......... 5.28M
750K .......... .......... .......... .......... .......... 3.93M
800K .......... .......... .......... .......... .......... 4.48M
850K .......... .......... .......... .......... .......... 4.95M
900K .......... .......... .......... .......... .......... 4.80M
950K .......... .......... .......... .......... .......... 5.27M
1000K .......... .......... .......... .......... .......... 5.10M
1050K .......... .......... .......... .......... .......... 5.59M
1100K .......... .......... .......... .......... .......... 5.48M
1150K .......... .......... .......... .......... .......... 5.24M
1200K .......... .......... .......... .......... .......... 5.40M
1250K .......... .......... .......... .......... .......... 5.69M
1300K .......... .......... .......... .......... .......... 5.11M
1350K .......... .......... .......... .......... .......... 6.30M
1400K .......... .......... .......... .......... .......... 4.97M
1450K .......... .......... .......... .......... .......... 7.48M
1500K .......... .......... .......... .......... .......... 8.73M
1550K .......... .......... .......... .......... .......... 4.89M
1600K .......... .......... .......... .......... .......... 7.32M
1650K .......... .......... .......... .......... .......... 5.84M
1700K .......... .......... .......... .......... .......... 8.32M
1750K .......... .......... .......... .......... .......... 7.74M
1800K .......... .......... .......... .......... .......... 5.71M
1850K .......... .......... .......... .......... .......... 9.01M
1900K .......... .......... .......... .......... .......... 7.35M
1950K .......... .......... .......... .......... .......... 6.41M
2000K .......... .......... .......... .......... .......... 8.13M
2050K .......... .......... .......... .......... .......... 10.5M
2100K .......... .......... .......... .......... .......... 5.47M
2150K .......... .......... .......... .......... .......... 9.04M
2200K .......... .......... .......... .......... .......... 6.53M
2250K .......... .......... .......... .......... .......... 10.6M
2300K .......... .......... .......... .......... .......... 8.93M
2350K .......... .......... .......... .......... .......... 5.99M
2400K .......... .......... .......... .......... .......... 13.1M
2450K .......... .......... .......... .......... .......... 10.6M
2500K .......... .......... .......... .......... .......... 9.44M
2550K .......... .......... .......... .......... .......... 6.36M
2600K .......... .......... .......... .......... .......... 12.4M
2650K .......... .......... .......... .......... .......... 9.65M
2700K .......... .......... .......... .......... .......... 2.68M
2750K .......... .......... .......... .......... .......... 13.4M
2800K .......... .......... .......... .......... .......... 16.5M
2850K .......... .......... .......... .......... .......... 24.9M
2900K .......... .......... .......... .......... .......... 4.71M
2950K .......... .......... .......... .......... .......... 12.2M
3000K .......... .......... .......... .......... .......... 12.5M
3050K .......... .......... .......... .......... .......... 11.6M
3100K .......... .......... .......... .......... .......... 7.08M
3150K .......... .......... .......... .......... .......... 11.0M
3200K .......... .......... .......... .......... .......... 15.9M
3250K .......... .......... .......... .......... .......... 11.6M
3300K .......... .......... .......... .......... .......... 7.41M
3350K .......... .......... .......... .......... .......... 13.2M
3400K .......... .......... .......... .......... .......... 13.5M
3450K .......... .......... .......... .......... .......... 11.9M
3500K .......... .......... .......... .......... .......... 7.33M
3550K .......... .......... .......... .......... .......... 14.2M
3600K .......... .......... .......... .......... .......... 17.1M
3650K .......... .......... .......... .......... .......... 14.6M
3700K .......... .......... .......... .......... .......... 10.6M
3750K .......... .......... .......... .......... .......... 8.35M
3800K .......... .......... .......... .......... .......... 13.2M
3850K .......... .......... .......... .......... .......... 14.9M
3900K .......... .......... .......... .......... .......... 12.9M
3950K .......... .......... .......... .......... .......... 8.15M
4000K .......... .......... .......... .......... .......... 15.2M
4050K .......... .......... .......... .......... .......... 15.5M
4100K .......... .......... .......... .......... .......... 15.6M
4150K .......... .......... .......... .......... .......... 13.6M
4200K .......... .......... .......... .......... .......... 8.86M
4250K .......... .......... .......... .......... .......... 12.3M
4300K .......... .......... .......... .......... .......... 16.8M
4350K .......... .......... .......... .......... .......... 17.4M
4400K .......... .......... .......... .......... .......... 14.9M
4450K .......... .......... .......... .......... .......... 8.28M
4500K .......... .......... .......... .......... .......... 16.9M
4550K .......... .......... .......... .......... .......... 16.5M
4600K .......... .......... .......... .......... .......... 20.1M
4650K .......... .......... .......... .......... .......... 12.2M
4700K .......... .......... .......... .......... .......... 6.99M
4750K .......... .......... .......... .......... .......... 27.3M
4800K .......... .......... .......... .......... .......... 24.5M
4850K .......... .......... .......... .......... .......... 10.7M
4900K .......... .......... .......... .......... .......... 18.1M
4950K .......... .......... .......... .......... .......... 22.1M
5000K .......... .......... .......... .......... .......... 155K
5050K .......... .......... .......... .......... .......... 15.3M
5100K .......... .......... .......... .......... .......... 3.96M
5150K .......... .......... .......... .......... .......... 12.0M
5200K .......... .......... .......... .......... .......... 11.5M
5250K .......... .......... .......... .......... .......... 12.9M
5300K .......... .......... .......... .......... .......... 13.8M
5350K .......... .......... .......... .......... .......... 9.31M
5400K .......... .......... .......... .......... .......... 12.3M
5450K .......... .......... .......... .......... .......... 10.9M
5500K .......... .......... .......... .......... .......... 13.8M
5550K .......... .......... .......... .......... .......... 11.5M
5600K .......... .......... .......... .......... .......... 9.91M
5650K .......... .......... .......... .......... .......... 13.7M
5700K .......... .......... .......... .......... .......... 11.9M
5750K .......... .......... .......... .......... .......... 19.8M
5800K .......... .......... .......... .......... .......... 8.41M
5850K .......... .......... .......... .......... .......... 13.2M
5900K .......... .......... .......... .......... .......... 13.9M
5950K .......... .......... .......... .......... .......... 14.9M
6000K .......... .......... .......... .......... .......... 9.74M
6050K .......... .......... .......... .......... .......... 10.4M
6100K .......... .......... .......... .......... .......... 12.1M
6150K .......... .......... .......... .......... .......... 10.5M
6200K .......... .......... .......... .......... .......... 9.44M
6250K .......... .......... .......... .......... .......... 9.99M
6300K .......... .......... .......... .......... .......... 11.4M
6350K .......... .......... .......... .......... .......... 11.3M
6400K .......... .......... .......... .......... .......... 9.48M
6450K .......... .......... .......... .......... .......... 11.2M
6500K .......... .......... .......... .......... .......... 9.62M
6550K .......... .......... .......... .......... .......... 11.3M
6600K .......... .......... .......... .......... .......... 14.9M
6650K .......... .......... .......... .......... .......... 14.6M
6700K .......... .......... .......... .......... .......... 12.0M
6750K .......... .......... .......... .......... .......... 10.8M
6800K .......... .......... .......... .......... .......... 13.2M
6850K .......... .......... .......... .......... .......... 17.8M
6900K .......... .......... .......... .......... .......... 17.9M
6950K .......... .......... .......... .......... .......... 17.0M
7000K .......... .......... .......... .......... .......... 397K
7050K .......... .......... .......... .......... .......... 18.6M
7100K .......... .......... .......... .......... .......... 17.7M
7150K .......... .......... .......... .......... .......... 18.6M
7200K .......... .......... .......... .......... .......... 14.0M
7250K .......... .......... .......... .......... .......... 7.73M
7300K .......... .......... .......... .......... .......... 16.9M
7350K .......... .......... .......... .......... .......... 24.0M
7400K .......... .......... .......... .......... .......... 14.7M
7450K .......... .......... .......... .......... .......... 19.6M
7500K .......... .......... .......... .......... .......... 20.3M
7550K .......... .......... .......... .......... .......... 8.28M
7600K .......... .......... .......... .......... .......... 15.2M
7650K .......... .......... .......... .......... .......... 27.7M
7700K .......... .......... .......... .......... .......... 18.5M
7750K .......... .......... .......... .......... .......... 19.0M
7800K .......... .......... .......... .......... .......... 19.6M
7850K .......... .......... .......... .......... .......... 8.47M
7900K .......... .......... .......... .......... .......... 14.9M
7950K .......... .......... .......... .......... .......... 27.5M
8000K .......... .......... .......... .......... .......... 16.9M
8050K .......... .......... .......... .......... .......... 19.6M
8100K .......... .......... .......... .......... .......... 26.3M
8150K .......... .......... .......... .......... .......... 3.38M
8200K .......... .......... .......... .......... .......... 58.7M
8250K .......... .......... .......... .......... .......... 89.1M
8300K .......... .......... .......... .......... .......... 69.0M
8350K .......... .......... .......... .......... .......... 63.6M
8400K .......... .......... .......... .......... .......... 28.1M
8450K .......... ... 46.9M=1.4s
2022-02-01 14:26:13 (5.79 MB/s) - ‘UppsalaCentrumWgot.osm’ saved [8667122]
pwd
ls
/databricks/driver
conf
derby.log
eventlogs
ganglia
logs
display(dbutils.fs.ls("dbfs:///datasets/"))
| path | name | size |
|---|---|---|
| dbfs:/datasets/alexandria/ | alexandria/ | 0.0 |
| dbfs:/datasets/beijing/ | beijing/ | 0.0 |
| dbfs:/datasets/magellan/ | magellan/ | 0.0 |
| dbfs:/datasets/maps/ | maps/ | 0.0 |
| dbfs:/datasets/mobile_sample/ | mobile_sample/ | 0.0 |
| dbfs:/datasets/osm/ | osm/ | 0.0 |
| dbfs:/datasets/sou/ | sou/ | 0.0 |
| dbfs:/datasets/t-drive-trips/ | t-drive-trips/ | 0.0 |
| dbfs:/datasets/t-drive-trips-magellan/ | t-drive-trips-magellan/ | 0.0 |
| dbfs:/datasets/taxis/ | taxis/ | 0.0 |
// making directory in distributed file system
dbutils.fs.mkdirs("dbfs:///datasets/maps/")
res1: Boolean = true
display(dbutils.fs.ls("dbfs:///datasets/maps/"))
| path | name | size |
|---|---|---|
| dbfs:/datasets/maps/StockholmCentrumWgot.osm | StockholmCentrumWgot.osm | 3820982.0 |
| dbfs:/datasets/maps/TinyUppsalaCentrumWgot.osm | TinyUppsalaCentrumWgot.osm | 919097.0 |
| dbfs:/datasets/maps/UppsalaCentrumWgot.osm | UppsalaCentrumWgot.osm | 8667122.0 |
// copy file from local fs to dbfs
dbutils.fs.cp("file:///databricks/driver/UppsalaCentrumWgot.osm","dbfs:///datasets/maps/")
res4: Boolean = true
display(dbutils.fs.ls("dbfs:///datasets/maps/"))
| path | name | size |
|---|---|---|
| dbfs:/datasets/maps/StockholmCentrumWgot.osm | StockholmCentrumWgot.osm | 3820982.0 |
| dbfs:/datasets/maps/TinyUppsalaCentrumWgot.osm | TinyUppsalaCentrumWgot.osm | 919097.0 |
| dbfs:/datasets/maps/UppsalaCentrumWgot.osm | UppsalaCentrumWgot.osm | 8667122.0 |
//Read the data from dbfs
val path = "dbfs:/datasets/maps/UppsalaCentrumWgot.osm"
val uppsalaCentrumOsmDF = spark.read
.format("magellan")
.option("type", "osm")
.load(path)
path: String = dbfs:/datasets/maps/UppsalaCentrumWgot.osm
uppsalaCentrumOsmDF: org.apache.spark.sql.DataFrame = [point: point, polyline: polyline ... 3 more fields]
uppsalaCentrumOsmDF.show()
+-----+--------------------+--------------------+--------------------+-----+
|point| polyline| polygon| metadata|valid|
+-----+--------------------+--------------------+--------------------+-----+
| null|magellan.PolyLine...| null|[electrified -> c...| true|
| null|magellan.PolyLine...| null| []| true|
| null|magellan.PolyLine...| null|[electrified -> c...| true|
| null|magellan.PolyLine...| null|[electrified -> c...| true|
| null|magellan.PolyLine...| null|[electrified -> c...| true|
| null|magellan.PolyLine...| null|[electrified -> c...| true|
| null|magellan.PolyLine...| null|[electrified -> c...| true|
| null|magellan.PolyLine...| null| [landuse -> grass]| true|
| null| null|magellan.Polygon@...| []| true|
| null| null|magellan.Polygon@...|[natural -> grass...| true|
| null| null|magellan.Polygon@...| [landuse -> grass]| true|
| null| null|magellan.Polygon@...| [landuse -> grass]| true|
| null| null|magellan.Polygon@...| [landuse -> grass]| true|
| null| null|magellan.Polygon@...| [leisure -> park]| true|
| null|magellan.PolyLine...| null|[highway -> cycle...| true|
| null|magellan.PolyLine...| null|[leisure -> park,...| true|
| null|magellan.PolyLine...| null|[bicycle -> desig...| true|
| null|magellan.PolyLine...| null|[name -> Bolandgy...| true|
| null| null|magellan.Polygon@...| [building -> yes]| true|
| null| null|magellan.Polygon@...|[electrified -> n...| true|
+-----+--------------------+--------------------+--------------------+-----+
only showing top 20 rows
display(uppsalaCentrumOsmDF)
uppsalaCentrumOsmDF.count
res9: Long = 32112
wget -O TinyUppsalaCentrumWgot.osm "https://api.openstreetmap.org/api/0.6/map?bbox=17.63514,59.85739,17.64154,59.86011"
--2022-02-02 08:56:29-- https://api.openstreetmap.org/api/0.6/map?bbox=17.63514,59.85739,17.64154,59.86011
Resolving api.openstreetmap.org (api.openstreetmap.org)... 130.117.76.13, 130.117.76.11, 130.117.76.12, ...
Connecting to api.openstreetmap.org (api.openstreetmap.org)|130.117.76.13|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/xml]
Saving to: ‘TinyUppsalaCentrumWgot.osm’
0K .......... .......... .......... .......... .......... 1.32M
50K .......... .......... .......... .......... .......... 3.11M
100K .......... .......... .......... .......... .......... 3.17M
150K .......... .......... .......... .......... .......... 2.84M
200K .......... .......... .......... .......... .......... 4.22M
250K .......... .......... .......... .......... .......... 3.36M
300K .......... .......... .......... .......... .......... 3.69M
350K .......... .......... .......... .......... .......... 3.11M
400K .......... .......... .......... .......... .......... 4.36M
450K .......... .......... .......... .......... .......... 4.32M
500K .......... .......... .......... .......... .......... 4.20M
550K .......... .......... .......... .......... .......... 4.21M
600K .......... .......... .......... .......... .......... 4.49M
650K .......... .......... .......... .......... .......... 4.55M
700K .......... .......... .......... .......... .......... 4.71M
750K .......... .......... .......... .......... .......... 4.48M
800K .......... .......... .......... .......... .......... 5.36M
850K .......... .......... .......... .......... ....... 5.78M=0.2s
2022-02-02 08:56:29 (3.56 MB/s) - ‘TinyUppsalaCentrumWgot.osm’ saved [919096]
pwd
ls
/databricks/driver
TinyUppsalaCentrumWgot.osm
conf
derby.log
eventlogs
ganglia
logs
// copy file from local fs to dbfs
dbutils.fs.cp("file:///databricks/driver/TinyUppsalaCentrumWgot.osm","dbfs:///datasets/maps/")
display(dbutils.fs.ls("dbfs:///datasets/maps/"))
| path | name | size |
|---|---|---|
| dbfs:/datasets/maps/StockholmCentrumWgot.osm | StockholmCentrumWgot.osm | 3820982.0 |
| dbfs:/datasets/maps/TinyUppsalaCentrumWgot.osm | TinyUppsalaCentrumWgot.osm | 919096.0 |
| dbfs:/datasets/maps/UppsalaCentrumWgot.osm | UppsalaCentrumWgot.osm | 8667122.0 |
//read the file from dbfs
val path = "dbfs:/datasets/maps/TinyUppsalaCentrumWgot.osm"
val tinyUppsalaCentrumOsmDF = spark.read
.format("magellan")
.option("type", "osm")
.load(path)
display(tinyUppsalaCentrumOsmDF)
tinyUppsalaCentrumOsmDF.count
res12: Long = 1857
Setting up leaflet
You need to go to the following URL and set-up access-token in map-box to use leaflet independently:
- https://leafletjs.com/examples/quick-start/
- Request access-token:
- https://account.mapbox.com/auth/signin/?route-to=%22/access-tokens/%22
Visualise with leaflet:
Take an array of Strings in 'GeoJson' format, then insert this into a prebuild html string that contains all the code neccesary to display these features using Leaflet. The resulting html can be displayed in DataBricks using the displayHTML function.
See http://leafletjs.com/examples/geojson.html for a detailed example of using GeoJson with Leaflet.
//val point1 = sc.parallelize(Seq((59.839264, 17.647075),(59.9, 17.88))).toDF("x", "y")
val point1 = sc.parallelize(Seq((59.839264, 17.647075))).toDF("x", "y")
val point1c = point1.collect()
val string2 = point1c.mkString(",")
//df.select(columns: _*).collect.map(_.toSeq)
val string22 = "'random_string'"
point1: org.apache.spark.sql.DataFrame = [x: double, y: double]
point1c: Array[org.apache.spark.sql.Row] = Array([59.839264,17.647075])
string2: String = [59.839264,17.647075]
string22: String = 'random_string'
def genLeafletHTML(): String = {
val accessToken = "pk.eyJ1IjoiZHRnIiwiYSI6ImNpaWF6MGdiNDAwanNtemx6MmIyNXoyOWIifQ.ndbNtExCMXZHKyfNtEN0Vg"
val generatedHTML = f"""<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/leaflet/0.7.7/leaflet.css">
<style>#map {width: 600px; height:400px;}</style>
</head>
<body>
<div id="map" style="width: 600px; height: 400px"></div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/leaflet/0.7.7/leaflet.js"></script>
<script type="text/javascript">
var map = L.map('map').setView([59.838, 17.646865], 16);
L.tileLayer('https://api.tiles.mapbox.com/v4/{id}/{z}/{x}/{y}.png?access_token=$accessToken', {
maxZoom: 19
, id: 'mapbox.streets'
, attribution: '<a href="http://openstreetmap.org">OpenStreetMap</a> ' +
'<a href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a> ' +
'| © <a href="http://mapbox.com">Mapbox</a>'
}).addTo(map);
str1 = 'SDS<br>Ångströmlaboratoriet<br>59.839264, 17.647075<br>';
str2 = ${string22};
var popup = str1.concat(str2);
L.marker(${string2}).addTo(map)
.bindPopup(popup)
.openPopup();
</script>
</body>
"""
generatedHTML
}
displayHTML(genLeafletHTML)
Specifying the Time frame we are interested in.
val startTime: Timestamp = Timestamp.valueOf("2008-02-03 00:00:00.0")
val endTime: Timestamp = Timestamp.valueOf("2008-02-03 01:00:00.0")
startTime: java.sql.Timestamp = 2008-02-03 00:00:00.0
endTime: java.sql.Timestamp = 2008-02-03 01:00:00.0
Now the getIntersectingTrips function can be run and the data points that intersect the space time volume are found.
val intersectingTrips = polygonDF.getIntersectingTrips(taxiDataSparkParquetRead, startTime, endTime) // taxiData
intersectingTrips: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [polygon: polygon, taxiId: int ... 2 more fields]
Here are all the taxi ids that pass through the polygon:
display(intersectingTrips.select($"taxiId", $"timeStamp"))
| taxiId | timeStamp |
|---|---|
| 6568.0 | 2008-02-03T00:09:05.000+0000 |
| 4912.0 | 2008-02-03T00:23:17.000+0000 |
| 4566.0 | 2008-02-03T00:07:54.000+0000 |
| 7989.0 | 2008-02-03T00:33:51.000+0000 |
| 3911.0 | 2008-02-03T00:07:56.000+0000 |
| 9231.0 | 2008-02-03T00:20:51.000+0000 |
| 2751.0 | 2008-02-03T00:44:21.000+0000 |
| 3390.0 | 2008-02-03T00:40:08.000+0000 |
| 1242.0 | 2008-02-03T00:03:38.000+0000 |
| 8177.0 | 2008-02-03T00:20:26.000+0000 |
| 8528.0 | 2008-02-03T00:20:57.000+0000 |
| 1606.0 | 2008-02-03T00:45:28.000+0000 |
| 2917.0 | 2008-02-03T00:28:27.000+0000 |
| 4912.0 | 2008-02-03T00:23:17.000+0000 |
A list of all the taxis that take a trip around the square:
display(intersectingTrips.select($"taxiId").distinct)
| taxiId |
|---|
| 7989.0 |
| 3390.0 |
| 9231.0 |
| 1242.0 |
| 6568.0 |
| 8177.0 |
| 4912.0 |
| 2751.0 |
| 8528.0 |
| 3911.0 |
| 4566.0 |
| 2917.0 |
| 1606.0 |
display(intersectingTrips.groupBy($"taxiId").count.orderBy(-$"count"))
| taxiId | count |
|---|---|
| 4912.0 | 2.0 |
| 7989.0 | 1.0 |
| 3390.0 | 1.0 |
| 9231.0 | 1.0 |
| 1242.0 | 1.0 |
| 6568.0 | 1.0 |
| 8177.0 | 1.0 |
| 2751.0 | 1.0 |
| 8528.0 | 1.0 |
| 3911.0 | 1.0 |
| 4566.0 | 1.0 |
| 2917.0 | 1.0 |
| 1606.0 | 1.0 |
Cleanup your mess in distributed RAM
taxiDataSparkParquetRead.unpersist()
res29: taxiDataSparkParquetRead.type = [taxiId: int, timeStamp: timestamp ... 1 more field]
taxiDataSpark.unpersist()
res30: taxiDataSpark.type = [taxiId: int, timeStamp: timestamp ... 1 more field]
taxiRepartioned.unpersist()
res31: taxiRepartioned.type = MapPartitionsRDD[95] at repartition at command-2971213210274715:1
taxiData.unpersist()
res32: taxiData.type = [taxiId: int, timeStamp: timestamp ... 1 more field]
ls dbfs:/datasets/t-drive-trips
| path | name | size |
|---|---|---|
| dbfs:/datasets/t-drive-trips/_SUCCESS | _SUCCESS | 0.0 |
| dbfs:/datasets/t-drive-trips/_committed_3926031913428555637 | _committed_3926031913428555637 | 10024.0 |
| dbfs:/datasets/t-drive-trips/_committed_448783018784947015 | _committed_448783018784947015 | 19934.0 |
| dbfs:/datasets/t-drive-trips/_committed_vacuum5877992330899363965 | _committed_vacuum5877992330899363965 | 96.0 |
| dbfs:/datasets/t-drive-trips/_started_448783018784947015 | _started_448783018784947015 | 0.0 |
| dbfs:/datasets/t-drive-trips/part-00000-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-241-1-c000.snappy.parquet | part-00000-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-241-1-c000.snappy.parquet | 3316133.0 |
| dbfs:/datasets/t-drive-trips/part-00001-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-242-1-c000.snappy.parquet | part-00001-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-242-1-c000.snappy.parquet | 3323394.0 |
| dbfs:/datasets/t-drive-trips/part-00002-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-243-1-c000.snappy.parquet | part-00002-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-243-1-c000.snappy.parquet | 3309944.0 |
| dbfs:/datasets/t-drive-trips/part-00003-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-244-1-c000.snappy.parquet | part-00003-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-244-1-c000.snappy.parquet | 3316166.0 |
| dbfs:/datasets/t-drive-trips/part-00004-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-245-1-c000.snappy.parquet | part-00004-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-245-1-c000.snappy.parquet | 3307819.0 |
| dbfs:/datasets/t-drive-trips/part-00005-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-246-1-c000.snappy.parquet | part-00005-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-246-1-c000.snappy.parquet | 3317356.0 |
| dbfs:/datasets/t-drive-trips/part-00006-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-247-1-c000.snappy.parquet | part-00006-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-247-1-c000.snappy.parquet | 3322203.0 |
| dbfs:/datasets/t-drive-trips/part-00007-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-248-1-c000.snappy.parquet | part-00007-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-248-1-c000.snappy.parquet | 3328137.0 |
| dbfs:/datasets/t-drive-trips/part-00008-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-249-1-c000.snappy.parquet | part-00008-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-249-1-c000.snappy.parquet | 3320837.0 |
| dbfs:/datasets/t-drive-trips/part-00009-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-250-1-c000.snappy.parquet | part-00009-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-250-1-c000.snappy.parquet | 3329892.0 |
| dbfs:/datasets/t-drive-trips/part-00010-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-251-1-c000.snappy.parquet | part-00010-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-251-1-c000.snappy.parquet | 3324941.0 |
| dbfs:/datasets/t-drive-trips/part-00011-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-252-1-c000.snappy.parquet | part-00011-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-252-1-c000.snappy.parquet | 3321528.0 |
| dbfs:/datasets/t-drive-trips/part-00012-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-253-1-c000.snappy.parquet | part-00012-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-253-1-c000.snappy.parquet | 3328393.0 |
| dbfs:/datasets/t-drive-trips/part-00013-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-254-1-c000.snappy.parquet | part-00013-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-254-1-c000.snappy.parquet | 3314838.0 |
| dbfs:/datasets/t-drive-trips/part-00014-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-255-1-c000.snappy.parquet | part-00014-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-255-1-c000.snappy.parquet | 3312383.0 |
| dbfs:/datasets/t-drive-trips/part-00015-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-256-1-c000.snappy.parquet | part-00015-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-256-1-c000.snappy.parquet | 3317943.0 |
| dbfs:/datasets/t-drive-trips/part-00016-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-257-1-c000.snappy.parquet | part-00016-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-257-1-c000.snappy.parquet | 3312259.0 |
| dbfs:/datasets/t-drive-trips/part-00017-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-258-1-c000.snappy.parquet | part-00017-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-258-1-c000.snappy.parquet | 3326403.0 |
| dbfs:/datasets/t-drive-trips/part-00018-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-259-1-c000.snappy.parquet | part-00018-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-259-1-c000.snappy.parquet | 3316396.0 |
| dbfs:/datasets/t-drive-trips/part-00019-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-260-1-c000.snappy.parquet | part-00019-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-260-1-c000.snappy.parquet | 3334055.0 |
| dbfs:/datasets/t-drive-trips/part-00020-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-261-1-c000.snappy.parquet | part-00020-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-261-1-c000.snappy.parquet | 3315604.0 |
| dbfs:/datasets/t-drive-trips/part-00021-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-262-1-c000.snappy.parquet | part-00021-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-262-1-c000.snappy.parquet | 3322431.0 |
| dbfs:/datasets/t-drive-trips/part-00022-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-263-1-c000.snappy.parquet | part-00022-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-263-1-c000.snappy.parquet | 3327427.0 |
| dbfs:/datasets/t-drive-trips/part-00023-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-264-1-c000.snappy.parquet | part-00023-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-264-1-c000.snappy.parquet | 3309770.0 |
| dbfs:/datasets/t-drive-trips/part-00024-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-265-1-c000.snappy.parquet | part-00024-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-265-1-c000.snappy.parquet | 3322627.0 |
| dbfs:/datasets/t-drive-trips/part-00025-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-266-1-c000.snappy.parquet | part-00025-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-266-1-c000.snappy.parquet | 3325132.0 |
| dbfs:/datasets/t-drive-trips/part-00026-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-267-1-c000.snappy.parquet | part-00026-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-267-1-c000.snappy.parquet | 3313093.0 |
| dbfs:/datasets/t-drive-trips/part-00027-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-268-1-c000.snappy.parquet | part-00027-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-268-1-c000.snappy.parquet | 3316395.0 |
| dbfs:/datasets/t-drive-trips/part-00028-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-269-1-c000.snappy.parquet | part-00028-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-269-1-c000.snappy.parquet | 3323660.0 |
| dbfs:/datasets/t-drive-trips/part-00029-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-270-1-c000.snappy.parquet | part-00029-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-270-1-c000.snappy.parquet | 3337843.0 |
| dbfs:/datasets/t-drive-trips/part-00030-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-271-1-c000.snappy.parquet | part-00030-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-271-1-c000.snappy.parquet | 3331530.0 |
| dbfs:/datasets/t-drive-trips/part-00031-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-272-1-c000.snappy.parquet | part-00031-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-272-1-c000.snappy.parquet | 3335209.0 |
| dbfs:/datasets/t-drive-trips/part-00032-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-273-1-c000.snappy.parquet | part-00032-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-273-1-c000.snappy.parquet | 3336128.0 |
| dbfs:/datasets/t-drive-trips/part-00033-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-274-1-c000.snappy.parquet | part-00033-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-274-1-c000.snappy.parquet | 3342654.0 |
| dbfs:/datasets/t-drive-trips/part-00034-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-275-1-c000.snappy.parquet | part-00034-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-275-1-c000.snappy.parquet | 3316266.0 |
| dbfs:/datasets/t-drive-trips/part-00035-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-276-1-c000.snappy.parquet | part-00035-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-276-1-c000.snappy.parquet | 3317877.0 |
| dbfs:/datasets/t-drive-trips/part-00036-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-277-1-c000.snappy.parquet | part-00036-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-277-1-c000.snappy.parquet | 3322694.0 |
| dbfs:/datasets/t-drive-trips/part-00037-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-278-1-c000.snappy.parquet | part-00037-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-278-1-c000.snappy.parquet | 3330480.0 |
| dbfs:/datasets/t-drive-trips/part-00038-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-279-1-c000.snappy.parquet | part-00038-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-279-1-c000.snappy.parquet | 3312271.0 |
| dbfs:/datasets/t-drive-trips/part-00039-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-280-1-c000.snappy.parquet | part-00039-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-280-1-c000.snappy.parquet | 3314031.0 |
| dbfs:/datasets/t-drive-trips/part-00040-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-281-1-c000.snappy.parquet | part-00040-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-281-1-c000.snappy.parquet | 3331866.0 |
| dbfs:/datasets/t-drive-trips/part-00041-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-282-1-c000.snappy.parquet | part-00041-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-282-1-c000.snappy.parquet | 3322115.0 |
| dbfs:/datasets/t-drive-trips/part-00042-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-283-1-c000.snappy.parquet | part-00042-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-283-1-c000.snappy.parquet | 3326874.0 |
| dbfs:/datasets/t-drive-trips/part-00043-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-284-1-c000.snappy.parquet | part-00043-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-284-1-c000.snappy.parquet | 3327994.0 |
| dbfs:/datasets/t-drive-trips/part-00044-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-285-1-c000.snappy.parquet | part-00044-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-285-1-c000.snappy.parquet | 3330087.0 |
| dbfs:/datasets/t-drive-trips/part-00045-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-286-1-c000.snappy.parquet | part-00045-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-286-1-c000.snappy.parquet | 3328726.0 |
| dbfs:/datasets/t-drive-trips/part-00046-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-287-1-c000.snappy.parquet | part-00046-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-287-1-c000.snappy.parquet | 3321983.0 |
| dbfs:/datasets/t-drive-trips/part-00047-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-288-1-c000.snappy.parquet | part-00047-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-288-1-c000.snappy.parquet | 3332147.0 |
| dbfs:/datasets/t-drive-trips/part-00048-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-289-1-c000.snappy.parquet | part-00048-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-289-1-c000.snappy.parquet | 3332842.0 |
| dbfs:/datasets/t-drive-trips/part-00049-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-290-1-c000.snappy.parquet | part-00049-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-290-1-c000.snappy.parquet | 3323693.0 |
| dbfs:/datasets/t-drive-trips/part-00050-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-291-1-c000.snappy.parquet | part-00050-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-291-1-c000.snappy.parquet | 3333414.0 |
| dbfs:/datasets/t-drive-trips/part-00051-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-292-1-c000.snappy.parquet | part-00051-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-292-1-c000.snappy.parquet | 3303953.0 |
| dbfs:/datasets/t-drive-trips/part-00052-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-293-1-c000.snappy.parquet | part-00052-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-293-1-c000.snappy.parquet | 3338614.0 |
| dbfs:/datasets/t-drive-trips/part-00053-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-294-1-c000.snappy.parquet | part-00053-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-294-1-c000.snappy.parquet | 3330205.0 |
| dbfs:/datasets/t-drive-trips/part-00054-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-295-1-c000.snappy.parquet | part-00054-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-295-1-c000.snappy.parquet | 3306341.0 |
| dbfs:/datasets/t-drive-trips/part-00055-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-296-1-c000.snappy.parquet | part-00055-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-296-1-c000.snappy.parquet | 3333542.0 |
| dbfs:/datasets/t-drive-trips/part-00056-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-297-1-c000.snappy.parquet | part-00056-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-297-1-c000.snappy.parquet | 3315821.0 |
| dbfs:/datasets/t-drive-trips/part-00057-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-298-1-c000.snappy.parquet | part-00057-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-298-1-c000.snappy.parquet | 3332481.0 |
| dbfs:/datasets/t-drive-trips/part-00058-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-299-1-c000.snappy.parquet | part-00058-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-299-1-c000.snappy.parquet | 3338906.0 |
| dbfs:/datasets/t-drive-trips/part-00059-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-300-1-c000.snappy.parquet | part-00059-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-300-1-c000.snappy.parquet | 3296655.0 |
| dbfs:/datasets/t-drive-trips/part-00060-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-301-1-c000.snappy.parquet | part-00060-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-301-1-c000.snappy.parquet | 3324196.0 |
| dbfs:/datasets/t-drive-trips/part-00061-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-302-1-c000.snappy.parquet | part-00061-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-302-1-c000.snappy.parquet | 3328037.0 |
| dbfs:/datasets/t-drive-trips/part-00062-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-303-1-c000.snappy.parquet | part-00062-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-303-1-c000.snappy.parquet | 3304713.0 |
| dbfs:/datasets/t-drive-trips/part-00063-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-304-1-c000.snappy.parquet | part-00063-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-304-1-c000.snappy.parquet | 3322291.0 |
| dbfs:/datasets/t-drive-trips/part-00064-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-305-1-c000.snappy.parquet | part-00064-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-305-1-c000.snappy.parquet | 3315149.0 |
| dbfs:/datasets/t-drive-trips/part-00065-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-306-1-c000.snappy.parquet | part-00065-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-306-1-c000.snappy.parquet | 3331060.0 |
| dbfs:/datasets/t-drive-trips/part-00066-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-307-1-c000.snappy.parquet | part-00066-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-307-1-c000.snappy.parquet | 3319447.0 |
| dbfs:/datasets/t-drive-trips/part-00067-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-308-1-c000.snappy.parquet | part-00067-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-308-1-c000.snappy.parquet | 3302431.0 |
| dbfs:/datasets/t-drive-trips/part-00068-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-309-1-c000.snappy.parquet | part-00068-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-309-1-c000.snappy.parquet | 3318678.0 |
| dbfs:/datasets/t-drive-trips/part-00069-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-310-1-c000.snappy.parquet | part-00069-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-310-1-c000.snappy.parquet | 3310107.0 |
| dbfs:/datasets/t-drive-trips/part-00070-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-311-1-c000.snappy.parquet | part-00070-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-311-1-c000.snappy.parquet | 3332591.0 |
| dbfs:/datasets/t-drive-trips/part-00071-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-312-1-c000.snappy.parquet | part-00071-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-312-1-c000.snappy.parquet | 3313772.0 |
| dbfs:/datasets/t-drive-trips/part-00072-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-313-1-c000.snappy.parquet | part-00072-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-313-1-c000.snappy.parquet | 3317966.0 |
| dbfs:/datasets/t-drive-trips/part-00073-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-314-1-c000.snappy.parquet | part-00073-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-314-1-c000.snappy.parquet | 3324060.0 |
| dbfs:/datasets/t-drive-trips/part-00074-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-315-1-c000.snappy.parquet | part-00074-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-315-1-c000.snappy.parquet | 3333476.0 |
| dbfs:/datasets/t-drive-trips/part-00075-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-316-1-c000.snappy.parquet | part-00075-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-316-1-c000.snappy.parquet | 3303679.0 |
| dbfs:/datasets/t-drive-trips/part-00076-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-317-1-c000.snappy.parquet | part-00076-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-317-1-c000.snappy.parquet | 3328348.0 |
| dbfs:/datasets/t-drive-trips/part-00077-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-318-1-c000.snappy.parquet | part-00077-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-318-1-c000.snappy.parquet | 3313956.0 |
| dbfs:/datasets/t-drive-trips/part-00078-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-319-1-c000.snappy.parquet | part-00078-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-319-1-c000.snappy.parquet | 3312261.0 |
| dbfs:/datasets/t-drive-trips/part-00079-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-320-1-c000.snappy.parquet | part-00079-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-320-1-c000.snappy.parquet | 3328554.0 |
| dbfs:/datasets/t-drive-trips/part-00080-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-321-1-c000.snappy.parquet | part-00080-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-321-1-c000.snappy.parquet | 3324701.0 |
| dbfs:/datasets/t-drive-trips/part-00081-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-322-1-c000.snappy.parquet | part-00081-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-322-1-c000.snappy.parquet | 3320848.0 |
| dbfs:/datasets/t-drive-trips/part-00082-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-323-1-c000.snappy.parquet | part-00082-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-323-1-c000.snappy.parquet | 3328741.0 |
| dbfs:/datasets/t-drive-trips/part-00083-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-324-1-c000.snappy.parquet | part-00083-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-324-1-c000.snappy.parquet | 3316403.0 |
| dbfs:/datasets/t-drive-trips/part-00084-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-325-1-c000.snappy.parquet | part-00084-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-325-1-c000.snappy.parquet | 3315476.0 |
| dbfs:/datasets/t-drive-trips/part-00085-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-326-1-c000.snappy.parquet | part-00085-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-326-1-c000.snappy.parquet | 3330775.0 |
| dbfs:/datasets/t-drive-trips/part-00086-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-327-1-c000.snappy.parquet | part-00086-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-327-1-c000.snappy.parquet | 3340227.0 |
| dbfs:/datasets/t-drive-trips/part-00087-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-328-1-c000.snappy.parquet | part-00087-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-328-1-c000.snappy.parquet | 3307622.0 |
| dbfs:/datasets/t-drive-trips/part-00088-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-329-1-c000.snappy.parquet | part-00088-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-329-1-c000.snappy.parquet | 3316059.0 |
| dbfs:/datasets/t-drive-trips/part-00089-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-330-1-c000.snappy.parquet | part-00089-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-330-1-c000.snappy.parquet | 3320127.0 |
| dbfs:/datasets/t-drive-trips/part-00090-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-331-1-c000.snappy.parquet | part-00090-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-331-1-c000.snappy.parquet | 3326165.0 |
| dbfs:/datasets/t-drive-trips/part-00091-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-332-1-c000.snappy.parquet | part-00091-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-332-1-c000.snappy.parquet | 3333324.0 |
| dbfs:/datasets/t-drive-trips/part-00092-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-333-1-c000.snappy.parquet | part-00092-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-333-1-c000.snappy.parquet | 3326087.0 |
| dbfs:/datasets/t-drive-trips/part-00093-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-334-1-c000.snappy.parquet | part-00093-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-334-1-c000.snappy.parquet | 3305959.0 |
| dbfs:/datasets/t-drive-trips/part-00094-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-335-1-c000.snappy.parquet | part-00094-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-335-1-c000.snappy.parquet | 3321270.0 |
| dbfs:/datasets/t-drive-trips/part-00095-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-336-1-c000.snappy.parquet | part-00095-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-336-1-c000.snappy.parquet | 3309747.0 |
| dbfs:/datasets/t-drive-trips/part-00096-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-337-1-c000.snappy.parquet | part-00096-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-337-1-c000.snappy.parquet | 3329141.0 |
| dbfs:/datasets/t-drive-trips/part-00097-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-338-1-c000.snappy.parquet | part-00097-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-338-1-c000.snappy.parquet | 3328356.0 |
| dbfs:/datasets/t-drive-trips/part-00098-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-339-1-c000.snappy.parquet | part-00098-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-339-1-c000.snappy.parquet | 3315952.0 |
| dbfs:/datasets/t-drive-trips/part-00099-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-340-1-c000.snappy.parquet | part-00099-tid-448783018784947015-d4c17528-9029-4726-9ba2-22adc6af2b68-340-1-c000.snappy.parquet | 3323318.0 |
ls dbfs:/datasets/t-drive-trips-magellan
| path | name | size |
|---|---|---|
| dbfs:/datasets/t-drive-trips-magellan/_committed_3681160352079832245 | _committed_3681160352079832245 | 10024.0 |
| dbfs:/datasets/t-drive-trips-magellan/_committed_5210346527157697119 | _committed_5210346527157697119 | 19634.0 |
| dbfs:/datasets/t-drive-trips-magellan/_committed_5745945769935478881 | _committed_5745945769935478881 | 19623.0 |
| dbfs:/datasets/t-drive-trips-magellan/_committed_6691263342585570063 | _committed_6691263342585570063 | 19623.0 |
| dbfs:/datasets/t-drive-trips-magellan/_committed_vacuum5880288197393177303 | _committed_vacuum5880288197393177303 | 129.0 |
| dbfs:/datasets/t-drive-trips-magellan/_started_5745945769935478881 | _started_5745945769935478881 | 0.0 |
| dbfs:/datasets/t-drive-trips-magellan/_started_6691263342585570063 | _started_6691263342585570063 | 0.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00000-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-745-1-c000.snappy.parquet | part-00000-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-745-1-c000.snappy.parquet | 6116439.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00001-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-747-1-c000.snappy.parquet | part-00001-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-747-1-c000.snappy.parquet | 6109529.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00002-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-749-1-c000.snappy.parquet | part-00002-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-749-1-c000.snappy.parquet | 6127623.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00003-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-750-1-c000.snappy.parquet | part-00003-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-750-1-c000.snappy.parquet | 6075264.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00004-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-751-1-c000.snappy.parquet | part-00004-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-751-1-c000.snappy.parquet | 6121668.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00005-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-752-1-c000.snappy.parquet | part-00005-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-752-1-c000.snappy.parquet | 6160575.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00006-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-753-1-c000.snappy.parquet | part-00006-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-753-1-c000.snappy.parquet | 6128421.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00007-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-754-1-c000.snappy.parquet | part-00007-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-754-1-c000.snappy.parquet | 6094968.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00008-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-755-1-c000.snappy.parquet | part-00008-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-755-1-c000.snappy.parquet | 6177364.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00009-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-756-1-c000.snappy.parquet | part-00009-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-756-1-c000.snappy.parquet | 6156075.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00010-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-757-1-c000.snappy.parquet | part-00010-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-757-1-c000.snappy.parquet | 6128188.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00011-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-758-1-c000.snappy.parquet | part-00011-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-758-1-c000.snappy.parquet | 6087318.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00012-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-759-1-c000.snappy.parquet | part-00012-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-759-1-c000.snappy.parquet | 6163969.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00013-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-760-1-c000.snappy.parquet | part-00013-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-760-1-c000.snappy.parquet | 6191786.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00014-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-761-1-c000.snappy.parquet | part-00014-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-761-1-c000.snappy.parquet | 6100593.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00015-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-762-1-c000.snappy.parquet | part-00015-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-762-1-c000.snappy.parquet | 6143283.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00016-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-763-1-c000.snappy.parquet | part-00016-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-763-1-c000.snappy.parquet | 6179004.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00017-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-766-1-c000.snappy.parquet | part-00017-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-766-1-c000.snappy.parquet | 6109483.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00018-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-767-1-c000.snappy.parquet | part-00018-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-767-1-c000.snappy.parquet | 6091116.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00019-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-768-1-c000.snappy.parquet | part-00019-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-768-1-c000.snappy.parquet | 6175989.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00020-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-770-1-c000.snappy.parquet | part-00020-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-770-1-c000.snappy.parquet | 6164017.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00021-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-772-1-c000.snappy.parquet | part-00021-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-772-1-c000.snappy.parquet | 6086937.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00022-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-773-1-c000.snappy.parquet | part-00022-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-773-1-c000.snappy.parquet | 6121136.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00023-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-776-1-c000.snappy.parquet | part-00023-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-776-1-c000.snappy.parquet | 6113180.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00024-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-777-1-c000.snappy.parquet | part-00024-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-777-1-c000.snappy.parquet | 6141102.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00025-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-778-1-c000.snappy.parquet | part-00025-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-778-1-c000.snappy.parquet | 6107475.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00026-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-779-1-c000.snappy.parquet | part-00026-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-779-1-c000.snappy.parquet | 6108195.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00027-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-780-1-c000.snappy.parquet | part-00027-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-780-1-c000.snappy.parquet | 6145437.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00028-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-781-1-c000.snappy.parquet | part-00028-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-781-1-c000.snappy.parquet | 6108490.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00029-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-782-1-c000.snappy.parquet | part-00029-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-782-1-c000.snappy.parquet | 6172917.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00030-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-783-1-c000.snappy.parquet | part-00030-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-783-1-c000.snappy.parquet | 6162200.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00031-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-784-1-c000.snappy.parquet | part-00031-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-784-1-c000.snappy.parquet | 6034541.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00032-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-785-1-c000.snappy.parquet | part-00032-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-785-1-c000.snappy.parquet | 6178715.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00033-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-786-1-c000.snappy.parquet | part-00033-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-786-1-c000.snappy.parquet | 6045366.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00034-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-787-1-c000.snappy.parquet | part-00034-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-787-1-c000.snappy.parquet | 6055861.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00035-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-788-1-c000.snappy.parquet | part-00035-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-788-1-c000.snappy.parquet | 6102537.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00036-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-789-1-c000.snappy.parquet | part-00036-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-789-1-c000.snappy.parquet | 6146001.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00037-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-790-1-c000.snappy.parquet | part-00037-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-790-1-c000.snappy.parquet | 6115954.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00038-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-791-1-c000.snappy.parquet | part-00038-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-791-1-c000.snappy.parquet | 6189674.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00039-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-792-1-c000.snappy.parquet | part-00039-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-792-1-c000.snappy.parquet | 6125360.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00040-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-793-1-c000.snappy.parquet | part-00040-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-793-1-c000.snappy.parquet | 6129475.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00041-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-794-1-c000.snappy.parquet | part-00041-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-794-1-c000.snappy.parquet | 6096233.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00042-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-795-1-c000.snappy.parquet | part-00042-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-795-1-c000.snappy.parquet | 6102240.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00043-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-796-1-c000.snappy.parquet | part-00043-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-796-1-c000.snappy.parquet | 6092224.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00044-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-797-1-c000.snappy.parquet | part-00044-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-797-1-c000.snappy.parquet | 6150214.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00045-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-798-1-c000.snappy.parquet | part-00045-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-798-1-c000.snappy.parquet | 6154492.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00046-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-799-1-c000.snappy.parquet | part-00046-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-799-1-c000.snappy.parquet | 6075132.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00047-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-800-1-c000.snappy.parquet | part-00047-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-800-1-c000.snappy.parquet | 6159253.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00048-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-801-1-c000.snappy.parquet | part-00048-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-801-1-c000.snappy.parquet | 6147865.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00049-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-802-1-c000.snappy.parquet | part-00049-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-802-1-c000.snappy.parquet | 6109401.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00050-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-803-1-c000.snappy.parquet | part-00050-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-803-1-c000.snappy.parquet | 6098660.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00051-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-804-1-c000.snappy.parquet | part-00051-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-804-1-c000.snappy.parquet | 6065365.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00052-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-805-1-c000.snappy.parquet | part-00052-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-805-1-c000.snappy.parquet | 6166406.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00053-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-806-1-c000.snappy.parquet | part-00053-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-806-1-c000.snappy.parquet | 6123940.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00054-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-807-1-c000.snappy.parquet | part-00054-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-807-1-c000.snappy.parquet | 6182341.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00055-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-808-1-c000.snappy.parquet | part-00055-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-808-1-c000.snappy.parquet | 6107282.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00056-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-809-1-c000.snappy.parquet | part-00056-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-809-1-c000.snappy.parquet | 6114309.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00057-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-810-1-c000.snappy.parquet | part-00057-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-810-1-c000.snappy.parquet | 6151453.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00058-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-811-1-c000.snappy.parquet | part-00058-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-811-1-c000.snappy.parquet | 6203356.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00059-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-812-1-c000.snappy.parquet | part-00059-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-812-1-c000.snappy.parquet | 6113065.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00060-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-813-1-c000.snappy.parquet | part-00060-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-813-1-c000.snappy.parquet | 6101909.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00061-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-814-1-c000.snappy.parquet | part-00061-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-814-1-c000.snappy.parquet | 6134861.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00062-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-815-1-c000.snappy.parquet | part-00062-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-815-1-c000.snappy.parquet | 6106498.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00063-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-816-1-c000.snappy.parquet | part-00063-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-816-1-c000.snappy.parquet | 6147787.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00064-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-817-1-c000.snappy.parquet | part-00064-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-817-1-c000.snappy.parquet | 6100162.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00065-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-818-1-c000.snappy.parquet | part-00065-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-818-1-c000.snappy.parquet | 6108290.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00066-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-819-1-c000.snappy.parquet | part-00066-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-819-1-c000.snappy.parquet | 6089103.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00067-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-820-1-c000.snappy.parquet | part-00067-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-820-1-c000.snappy.parquet | 6166248.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00068-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-821-1-c000.snappy.parquet | part-00068-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-821-1-c000.snappy.parquet | 6115293.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00069-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-822-1-c000.snappy.parquet | part-00069-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-822-1-c000.snappy.parquet | 6076361.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00070-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-746-1-c000.snappy.parquet | part-00070-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-746-1-c000.snappy.parquet | 6140281.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00071-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-748-1-c000.snappy.parquet | part-00071-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-748-1-c000.snappy.parquet | 6082413.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00072-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-823-1-c000.snappy.parquet | part-00072-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-823-1-c000.snappy.parquet | 6018901.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00073-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-824-1-c000.snappy.parquet | part-00073-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-824-1-c000.snappy.parquet | 6170865.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00074-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-825-1-c000.snappy.parquet | part-00074-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-825-1-c000.snappy.parquet | 6158579.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00075-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-826-1-c000.snappy.parquet | part-00075-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-826-1-c000.snappy.parquet | 6180276.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00076-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-827-1-c000.snappy.parquet | part-00076-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-827-1-c000.snappy.parquet | 6090038.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00077-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-828-1-c000.snappy.parquet | part-00077-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-828-1-c000.snappy.parquet | 6014542.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00078-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-829-1-c000.snappy.parquet | part-00078-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-829-1-c000.snappy.parquet | 6053589.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00079-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-830-1-c000.snappy.parquet | part-00079-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-830-1-c000.snappy.parquet | 6151854.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00080-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-831-1-c000.snappy.parquet | part-00080-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-831-1-c000.snappy.parquet | 6108894.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00081-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-832-1-c000.snappy.parquet | part-00081-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-832-1-c000.snappy.parquet | 6182298.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00082-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-833-1-c000.snappy.parquet | part-00082-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-833-1-c000.snappy.parquet | 6143713.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00083-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-834-1-c000.snappy.parquet | part-00083-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-834-1-c000.snappy.parquet | 6100835.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00084-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-835-1-c000.snappy.parquet | part-00084-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-835-1-c000.snappy.parquet | 6112446.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00085-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-836-1-c000.snappy.parquet | part-00085-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-836-1-c000.snappy.parquet | 6113911.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00086-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-764-1-c000.snappy.parquet | part-00086-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-764-1-c000.snappy.parquet | 6046043.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00087-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-765-1-c000.snappy.parquet | part-00087-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-765-1-c000.snappy.parquet | 6129799.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00088-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-837-1-c000.snappy.parquet | part-00088-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-837-1-c000.snappy.parquet | 6143620.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00089-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-838-1-c000.snappy.parquet | part-00089-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-838-1-c000.snappy.parquet | 6122388.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00090-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-769-1-c000.snappy.parquet | part-00090-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-769-1-c000.snappy.parquet | 6110756.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00091-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-771-1-c000.snappy.parquet | part-00091-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-771-1-c000.snappy.parquet | 6114874.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00092-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-839-1-c000.snappy.parquet | part-00092-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-839-1-c000.snappy.parquet | 6095812.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00093-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-840-1-c000.snappy.parquet | part-00093-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-840-1-c000.snappy.parquet | 6061281.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00094-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-774-1-c000.snappy.parquet | part-00094-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-774-1-c000.snappy.parquet | 6202331.0 |
| dbfs:/datasets/t-drive-trips-magellan/part-00095-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-775-1-c000.snappy.parquet | part-00095-tid-5745945769935478881-51a98f06-4547-48c2-a4b8-a36b55ffe594-775-1-c000.snappy.parquet | 6130937.0 |
This is part of Project MEP: Meme Evolution Programme and supported by databricks academic partners program.
Map-matching Noisy Spatial Trajectories of Vehicles to Roadways in Open Street Map
Dillon George, Dan Lilja and Raazesh Sainudiin
Copyright 2016-2019 Dillon George, Dan Lilja and Raazesh Sainudiin
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
This is the precursor 2016 presentation by Dillon George as part of Scalable Data Science from Middle Earth student project.
Here we are updating it to more recent versions of the needed libraries.
What is map-matching?
Map matching is the problem of how to match recorded geographic coordinates to a logical model of the real world, typically using some form of Geographic Information System. See https://en.wikipedia.org/wiki/Map_matching.
//This allows easy embedding of publicly available information into any other notebook
//when viewing in git-book just ignore this block - you may have to manually chase the URL in frameIt("URL").
//Example usage:
// displayHTML(frameIt("https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation#Topics_in_LDA",250))
def frameIt( u:String, h:Int ) : String = {
"""<iframe
src=""""+ u+""""
width="95%" height="""" + h + """"
sandbox>
<p>
<a href="http://spark.apache.org/docs/latest/index.html">
Fallback link for browsers that, unlikely, don't support frames
</a>
</p>
</iframe>"""
}
displayHTML(frameIt("https://en.wikipedia.org/wiki/Map_matching",600))
Why are we interested in map-matching?
Mainly because we can naturally deal with noise in raw GPS trajectories of entities moving along mapped ways, such as, vehicles, pedestrians or cyclists.
- Trajectories from sources like Uber are typically noisy and we will map-match such trajectories in this worksheet.
- Often, such trajectories lead to significant graph-dimensionality reduction as you will see below.
- More importantly, map-matching is a natural first step towards learning distributions over historical trajectories of an entity.
- Moreover, a set of map-matched trajectories (with additional work using kNN operations) can be turned into a graphX graph that can be vertex-programmed and joined with other graphX representations of the map itself.
How are we map-matching?
We are using graphHopper for this for now. See https://en.wikipedia.org/wiki/GraphHopper.
The following alternatives need exploration:
- BMW's barefoot on OSM (with Spark integration)
- https://github.com/bmwcarit/barefoot
- http://www.bmw-carit.com/blog/barefoot-release-an-open-source-java-library-for-map-matching-with-openstreetmap/ which seems to use a Hidden Markov Model from Microsoft Research.
The basic steps are the following:
- Preliminaries: 0. Attach needed libraries, load osm data and initialize graphhopper
- the two steps 0.1 and 0.2 need to be done only once per cluster
- Setting up leaflet for visualisation
- Load table of Uber Data from earlier analysis. Then convert to an RDD for mapmatching
- Start Map Matching
- Display Results of a map-matched trajectory
- Preliminaries
Loading required libraries
- Launch a cluster using spark 2.4.3 (this is for compatibility with magellan built from the forked repos; see first notebook in this folder!).
- Attach following libraries if you have not already done so:
- map_matching -
com.graphhopper:map-matching:0.6.0(more recent libraries may work but are note tested yet!) - magellan - import custom-built jar by downloading locally from
https://github.com/lamastex/scalable-data-science/blob/master/custom-builds/jars/magellan/forks/and then uploading to databricks - If needed only (this is already in databricks): spray-json
io.spray:spray-json_2.11:1.3.4
import com.graphhopper.matching._
import com.graphhopper._
import com.graphhopper.routing.util.{EncodingManager, CarFlagEncoder}
import com.graphhopper.storage.index.LocationIndexTree
import com.graphhopper.util.GPXEntry
import magellan.Point
import scala.collection.JavaConverters._
import spray.json._
import DefaultJsonProtocol._
import scala.util.{Try, Success, Failure}
import org.apache.spark.sql.functions._
import com.graphhopper.matching._
import com.graphhopper._
import com.graphhopper.routing.util.{EncodingManager, CarFlagEncoder}
import com.graphhopper.storage.index.LocationIndexTree
import com.graphhopper.util.GPXEntry
import magellan.Point
import scala.collection.JavaConverters._
import spray.json._
import DefaultJsonProtocol._
import scala.util.{Try, Success, Failure}
import org.apache.spark.sql.functions._
Do Step 0 at the bottom of the notebook
Only once in shard per OSM file (ignore this step the second time!):
- follow section below on **Step 0.1: Loading our OSM Data **
- follow section below on Step 0.2: Initialising GraphHopper
NOTE
If you loaded a smaller map so as to be able to analyze in the community edition, then you need the bounding box of this map to filter those trajectories that fall within this smaller map.
For example SanfranciscoSmall OSM map has the following bounding box:
-122.449,37.747and-122.397,37.772
Let's put them in Scala vals as follows:
val minLatInOSMMap = -122.449
val minLonInOSMMap = 37.747
val maxLatInOSMMap = -122.397
val maxLonInOSMMap = 37.772
minLatInOSMMap: Double = -122.449
minLonInOSMMap: Double = 37.747
maxLatInOSMMap: Double = -122.397
maxLonInOSMMap: Double = 37.772
- Setting up leaflet and visualisation
2.1 Setting up leaflet
You need to go to the following URL and set-up access-token in map-box to use leaflet independently:
- https://leafletjs.com/examples/quick-start/
- Request access-token:
- https://account.mapbox.com/auth/signin/?route-to=%22/access-tokens/%22
2.2 Visualising with leaflet
Take an array of Strings in 'GeoJson' format, then insert this into a prebuild html string that contains all the code neccesary to display these features using Leaflet. The resulting html can be displayed in databricks using the displayHTML function.
See http://leafletjs.com/examples/geojson.html for a detailed example of using GeoJson with Leaflet.
def genLeafletHTML(features: Array[String]): String = {
val featureArray = features.reduce(_ + "," + _)
// get your own access-token from https://leafletjs.com/examples/quick-start/
// see request-access token link above at: https://account.mapbox.com/auth/signin/?route-to=%22/access-tokens/%22
val accessToken = "pk.eyJ1Ijoic3RhdnJvdWxhdmxhY2hvdSIsImEiOiJjbDEzY3EwNDcycjBzM2JrYnBuemx4bmZkIn0.2DhL_f07vB0i7psep_QR8Q"
val generatedHTML = f"""<!DOCTYPE html>
<html>
<head>
<title>Maps</title>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/leaflet/0.7.7/leaflet.css">
<style>
#map {width: 600px; height:400px;}
</style>
</head>
<body>
<div id="map" style="width: 1000px; height: 600px"></div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/leaflet/0.7.7/leaflet.js"></script>
<script type="text/javascript">
var map = L.map('map').setView([37.77471008393265, -122.40422604391485], 14);
L.tileLayer('https://api.tiles.mapbox.com/v4/{id}/{z}/{x}/{y}.png?access_token=$accessToken', {
maxZoom: 18,
attribution: 'Map data © <a href="http://openstreetmap.org">OpenStreetMap</a> contributors, ' +
'<a href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a>, ' +
'Imagery © <a href="http://mapbox.com">Mapbox</a>',
id: 'mapbox.streets'
}).addTo(map);
var features = [$featureArray];
colors = features.map(function (_) {return rainbow(100, Math.floor(Math.random() * 100)); });
for (var i = 0; i < features.length; i++) {
console.log(i);
L.geoJson(features[i], {
pointToLayer: function (feature, latlng) {
return L.circleMarker(latlng, {
radius: 4,
fillColor: colors[i],
color: colors[i],
weight: 1,
opacity: 1,
fillOpacity: 0.8
});
}
}).addTo(map);
}
function rainbow(numOfSteps, step) {
// This function generates vibrant, "evenly spaced" colours (i.e. no clustering). This is ideal for creating easily distinguishable vibrant markers in Google Maps and other apps.
// Adam Cole, 2011-Sept-14
// HSV to RBG adapted from: http://mjijackson.com/2008/02/rgb-to-hsl-and-rgb-to-hsv-color-model-conversion-algorithms-in-javascript
var r, g, b;
var h = step / numOfSteps;
var i = ~~(h * 6);
var f = h * 6 - i;
var q = 1 - f;
switch(i %% 6){
case 0: r = 1; g = f; b = 0; break;
case 1: r = q; g = 1; b = 0; break;
case 2: r = 0; g = 1; b = f; break;
case 3: r = 0; g = q; b = 1; break;
case 4: r = f; g = 0; b = 1; break;
case 5: r = 1; g = 0; b = q; break;
}
var c = "#" + ("00" + (~ ~(r * 255)).toString(16)).slice(-2) + ("00" + (~ ~(g * 255)).toString(16)).slice(-2) + ("00" + (~ ~(b * 255)).toString(16)).slice(-2);
return (c);
}
</script>
</body>
"""
generatedHTML
}
genLeafletHTML: (features: Array[String])String
- Load Uber Data as in earlier analysis.
Then convert to an RDD for mapmatching
case class UberRecord(tripId: Int, time: String, latlon: Array[Double])
val uberData = sc.textFile("dbfs:/datasets/magellan/all.tsv").map { line =>
val parts = line.split("\t" )
val tripId = parts(0).toInt
val time = parts(1)
val latlon = Array(parts(3).toDouble, parts(2).toDouble)
UberRecord(tripId, time, latlon)
}.
repartition(100).
toDF().
select($"tripId", to_utc_timestamp($"time", "yyyy-MM-dd'T'HH:mm:ss").as("timeStamp"), $"latlon").
cache()
defined class UberRecord
uberData: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [tripId: int, timeStamp: timestamp ... 1 more field]
uberData.count()
res1: Long = 1128663
uberData.show(5,false)
+------+-------------------+------------------------+
|tripId|timeStamp |latlon |
+------+-------------------+------------------------+
|2 |2007-01-06 06:23:27|[-122.436298, 37.800702]|
|6 |2007-01-04 01:04:58|[-122.429251, 37.79932] |
|8 |2007-01-03 00:59:01|[-122.444698, 37.759913]|
|11 |2007-01-06 09:08:04|[-122.422785, 37.801069]|
|14 |2007-01-02 05:18:37|[-122.422255, 37.764986]|
+------+-------------------+------------------------+
only showing top 5 rows
val uberOSMMapBoundingBoxFiltered = uberData
.filter($"latlon"(0) >= minLatInOSMMap &&
$"latlon"(0) <= maxLatInOSMMap &&
$"latlon"(1) >= minLonInOSMMap &&
$"latlon"(1) <= maxLonInOSMMap)
.cache()
uberOSMMapBoundingBoxFiltered.count()
uberOSMMapBoundingBoxFiltered: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [tripId: int, timeStamp: timestamp ... 1 more field]
res1: Long = 253696
uberOSMMapBoundingBoxFiltered.show(5,false)
+------+-------------------+------------------------+
|tripId|timeStamp |latlon |
+------+-------------------+------------------------+
|8 |2007-01-03 00:59:01|[-122.444698, 37.759913]|
|14 |2007-01-02 05:18:37|[-122.422255, 37.764986]|
|26 |2007-01-07 07:17:52|[-122.434058, 37.763653]|
|38 |2007-01-07 16:05:22|[-122.433124, 37.763497]|
|87 |2007-01-06 00:40:58|[-122.408277, 37.769129]|
+------+-------------------+------------------------+
only showing top 5 rows
The number of trajectory points that are not within our bounding box of the OSM is:
uberData.count() - uberOSMMapBoundingBoxFiltered.count()
res6: Long = 874967
We will consider a trip to be invalid when it contains less that two data points, as this is required by Graph Hopper. First identify the all trips that are valid.
val uberCountsFiltered = uberOSMMapBoundingBoxFiltered
.groupBy($"tripId".alias("validTripId"))
.count.filter($"count" > 1)
.drop("count")
uberCountsFiltered: org.apache.spark.sql.DataFrame = [validTripId: int]
uberCountsFiltered.show(5, false)
+-----------+
|validTripId|
+-----------+
|833 |
|1829 |
|3175 |
|5300 |
|5518 |
+-----------+
only showing top 5 rows
Next is to join this list of valid Ids with the original data set, only the entries for those trips contained in uberCountsFiltered.
val uberValidData = uberOSMMapBoundingBoxFiltered
.join(uberCountsFiltered, uberOSMMapBoundingBoxFiltered("tripId") === uberCountsFiltered("validTripId")) // Only want trips with more than 2 data points
.drop("validTripId").cache
uberValidData: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [tripId: int, timeStamp: timestamp ... 1 more field]
Now seeing how many data points were dropped:
uberOSMMapBoundingBoxFiltered.count - uberValidData.count
res10: Long = 221
uberValidData.show(5,false)
+------+-------------------+------------------------+
|tripId|timeStamp |latlon |
+------+-------------------+------------------------+
|8 |2007-01-03 00:59:01|[-122.444698, 37.759913]|
|14 |2007-01-02 05:18:37|[-122.422255, 37.764986]|
|26 |2007-01-07 07:17:52|[-122.434058, 37.763653]|
|38 |2007-01-07 16:05:22|[-122.433124, 37.763497]|
|87 |2007-01-06 00:40:58|[-122.408277, 37.769129]|
+------+-------------------+------------------------+
only showing top 5 rows
Graphopper considers a trip to be a sequence of (latitude, longitude, time) tuples. First the relevant columns are selected from the DataFrame, and then the rows are mapped to key-value pairs with the tripId as the key. After this is done the reduceByKey step merges all the (lat, lon, time) arrays for each key (trip Id) so that there is one entry for each trip id containing all the relevant data points.
// To use sql api instead of rdd api
// val ubers = uberValidData.
// select($"tripId", struct($"latlon"(0), $"latLon"(1), $"timeStamp").as("coord"))
// .groupBy($"tripId")
// .agg(collect_set("coord").as("coords"))
val ubers = uberValidData.select($"tripId", $"latlon", $"timeStamp")
.map( row => {
val id = row.get(0).asInstanceOf[Integer]
val time = row.get(2).asInstanceOf[java.sql.Timestamp].getTime
// Array(lat, lon)
val latlon = row.get(1).asInstanceOf[scala.collection.mutable.WrappedArray[Double]]
val entry = Array((latlon(0), latlon(1), time))
(id, entry)
}
)
.rdd.reduceByKey( (e1, e2) => e1 ++ e2) // Sequence of timespace tuples
.cache
ubers: org.apache.spark.rdd.RDD[(Integer, Array[(Double, Double, Long)])] = ShuffledRDD[1195] at reduceByKey at command-2971213210274838:11
ubers.count
res12: Long = 8321
ubers.take(1) // first of 8,321 trip ids prepped and ready for map-matching
res13: Array[(Integer, Array[(Double, Double, Long)])] = Array((2100,Array((-122.430268,37.766517,1168142813000), (-122.430456,37.766368,1168142819000), (-122.430588,37.766267,1168142825000), (-122.430874,37.766065,1168142873000), (-122.431452,37.765596,1168142879000), (-122.43189,37.76524,1168142885000), (-122.432244,37.764965,1168142891000), (-122.432537,37.764759,1168142897000), (-122.432833,37.764534,1168142903000), (-122.433421,37.764042,1168142909000), (-122.434094,37.763526,1168142915000), (-122.406513,37.771497,1168142387000), (-122.40595,37.770276,1168142393000), (-122.406148,37.769156,1168142399000), (-122.407442,37.768924,1168142405000), (-122.409003,37.76907,1168142411000), (-122.410424,37.76931,1168142417000), (-122.412292,37.769523,1168142423000), (-122.414228,37.769585,1168142429000), (-122.416105,37.769652,1168142435000), (-122.4181,37.76988,1168142441000), (-122.419548,37.770068,1168142447000), (-122.420887,37.770144,1168142453000), (-122.422174,37.770579,1168142459000), (-122.422506,37.770788,1168142465000), (-122.422915,37.771149,1168142471000), (-122.422932,37.771242,1168142477000), (-122.423008,37.771513,1168142483000), (-122.423167,37.771772,1168142489000), (-122.423186,37.771882,1168142507000), (-122.423467,37.771914,1168142639000), (-122.42389,37.771557,1168142645000), (-122.424358,37.771214,1168142651000), (-122.42451,37.771112,1168142657000), (-122.424577,37.77107,1168142699000), (-122.424858,37.770832,1168142705000), (-122.425321,37.770462,1168142711000), (-122.425981,37.769912,1168142717000), (-122.426489,37.769485,1168142723000), (-122.42671,37.769349,1168142729000), (-122.426785,37.769304,1168142735000), (-122.427076,37.769072,1168142741000), (-122.427508,37.768713,1168142747000), (-122.42785,37.768438,1168142753000), (-122.427882,37.768396,1168142759000), (-122.427958,37.768333,1168142765000), (-122.428191,37.76816,1168142783000), (-122.428687,37.767765,1168142789000), (-122.429273,37.767284,1168142795000), (-122.429752,37.766923,1168142801000), (-122.430132,37.766626,1168142807000))))
- Start Map Matching
Now stepping into GraphHopper land we first define some utility functions for interfacing with the GraphHopper map matching library. Attaching the following artefact: - com.graphhopper:map-matching:0.6.0
This function takes a MatchResult from graphhopper and converts it into an Array of LON,LAT points.
def extractLatLong(mr: MatchResult): Array[(Double, Double)] = {
val pointsList = mr.getEdgeMatches.asScala.zipWithIndex
.map{ case (e, i) =>
if (i == 0) e.getEdgeState.fetchWayGeometry(3) // FetchWayGeometry returns vertices on graph if 2,
else e.getEdgeState.fetchWayGeometry(2) } // and edges if 3
.map{case pointList => pointList.asScala.toArray}
.flatMap{ case e => e}
val latLongs = pointsList.map(point => (point.lon, point.lat)).toArray
latLongs
}
extractLatLong: (mr: com.graphhopper.matching.MatchResult)Array[(Double, Double)]
The following returns a new GraphHopper object and encoder. It reads the pre-generated graphhopper 'database' from the dbfs, this way multiple graphHopper objects can be created on the workers all reading from the same shared database.
Currently the documentation is scattered all over the place if it exists at all. The method to create the Graph as specified in the map-matching repository differs from the main GraphHopper repository. The API should hopefully converge as GraphHopper matures
See the main graphHopper documentation here, and the map-matching documentation here.
This function returns a new GrapHopper object, with all settings defined and reading the graph from the location in dbfs. Note: setAllowWrites(false) ensures that multiple GraphHopper objects can read from the same files simultaneously.
def getHopper = {
val enc = new CarFlagEncoder() // Vehicle type
val hopp = new GraphHopper()
.setStoreOnFlush(true)
.setCHWeightings("shortest") // Contraction Hierarchy settings
.setAllowWrites(false) // Avoids issues when reading graph object fom HDFS
.setGraphHopperLocation("/dbfs/files/graphhopper/graphHopperData")
.setEncodingManager(new EncodingManager(enc))
hopp.importOrLoad()
(hopp, enc)
}
getHopper: (com.graphhopper.GraphHopper, com.graphhopper.routing.util.CarFlagEncoder)
The next step does the actual map matching. It begins by creating a new GraphHopper object for each partition, this is done as the GraphHopper objects themselves are not Serializable and so must be created on the partitions themselves to avoid this serialization step.
Then once all the GraphHopper and MapMatching objects are created and initialised map matching runs for each trajectory on that partition. The actual map matching is done in the mm.doWork() call, this returns a MatchResult object (it is wrapped in a Try statment as an exception is raised when no match is found). With this MatchResult, Failed matches are filtered out being replaced by dummy data, when successful the coordinates of the matched points are extracted into an array of (latitude, longitude)
The last (optional) step estimates the time taken to get from one matched point to another as currently there is no time information retained after the data has been map matched. This is a rather crude way of doing this and more sophisticated methods would be preferable.
Let's recall this most useful transformation first!
mapPartitions
Return a new RDD by applying a function to each partition of the RDD.

// let's look at a simple exmaple of mapPartitions in action
val x = sc.parallelize(Array(1,2,3), 2) // RDD with 2 partitions
x: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[96] at parallelize at command-2971213210274852:2
// our baby function we will call
def f(i:Iterator[Int])={ (i.sum, 42).productIterator }
f: (i: Iterator[Int])Iterator[Any]
val y = x.mapPartitions(f)
y: org.apache.spark.rdd.RDD[Any] = MapPartitionsRDD[97] at mapPartitions at command-2971213210274854:1
// glom() flattens elements on the same partition
val xOut = x.glom().collect()
xOut: Array[Array[Int]] = Array(Array(1), Array(2, 3))
val yOut = y.glom().collect() // we can see the mapPartitions with f applied to each partition
yOut: Array[Array[Any]] = Array(Array(1, 42), Array(5, 42))
Having understood the basic power of mapPartitions transformation, let's get back to map-matching problem at hand.
val matchTrips = ubers
.mapPartitions(partition => {
// Create the map matching object only once for each partition
val (hopp, enc) = getHopper
val tripGraph = hopp.getGraphHopperStorage()
val locationIndex = new LocationIndexMatch(tripGraph,
hopp.getLocationIndex().asInstanceOf[LocationIndexTree])
val mm = new MapMatching(tripGraph, locationIndex, enc)
def extractLatLong(mr: MatchResult): Array[(Double, Double)] = {
val pointsList = mr.getEdgeMatches.asScala.zipWithIndex
.map{ case (e, i) =>
if (i == 0) e.getEdgeState.fetchWayGeometry(3) // FetchWayGeometry returns vertices on graph if 2,
else e.getEdgeState.fetchWayGeometry(2) } // and edges if 3
.map{case pointList => pointList.asScala.toArray}
.flatMap{ case e => e}
val latLongs = pointsList.map(point => (point.lon, point.lat)).toArray
latLongs
}
// Map matching parameters
// Have not found any documention on what these do, other that comments in source code
// mm.setMaxSearchMultiplier(2000)
mm.setSeparatedSearchDistance(600)
mm.setForceRepair(true)
// Do the map matching for each trajectory
val matchedPartition = partition.map{case (key, dataPoints) => {
val sortedPoints = dataPoints.sortWith( (a, b) => a._3 < b._3) // Sort by time
val gpxEntries = sortedPoints.map{ case (lat, lon, time) => new GPXEntry(lon, lat, time)}.toList.asJava
val mr = Try(mm.doWork(gpxEntries)) // mapMatch the trajectory, Try() wraps the exception when no match can be found
val points = mr match {
case Success(result) => {
val pointsList = result.getEdgeMatches.asScala.zipWithIndex // (edge, index tuple)
.map{ case (e, i) =>
if (i == 0) e.getEdgeState.fetchWayGeometry(3) // FetchWayGeometry returns verts on graph if 2,
else e.getEdgeState.fetchWayGeometry(2) // and edges if 3 (I'm pretty sure that's the case)
}
.map{case pointList => pointList.asScala.toArray}
.flatMap{ case e => e}
val latLongs = pointsList.map(point => (point.lon, point.lat)).toArray
latLongs
}
case Failure(_) => Array[(Double, Double)]() // When no match can be made
}
// Use GraphHopper routing to get time estimates of the new matched trajcetory
/// NOTE: Currently only calculates time offsets from 0
val times = points.iterator.sliding(2).map{ pair =>
val (lonFrom, latFrom) = pair(0)
val (lonTo, latTo) = pair(1)
val req = new GHRequest(latFrom, lonFrom, latTo, lonTo)
.setWeighting("shortest")
.setVehicle("car")
.setLocale("US")
// val time = hopp.route(req).getTime -- using new method
val time = hopp.route(req).getBest.getTime
time
}
val timeOffsets = times.scanLeft(0.toLong){ (a: Long, b: Long) => a + b }.toList
(key, points.zip(timeOffsets)) // Return a tuple of (key, Array((lat, lon), timeOffSetFromStart))
}}
matchedPartition
}).cache
matchTrips: org.apache.spark.rdd.RDD[(Integer, Array[((Double, Double), Long)])] = MapPartitionsRDD[1196] at mapPartitions at command-2971213210274858:2
display(matchTrips.toDF.limit(2))
// Define the schema of the points in a map matched trip
case class UberMatched(id: Int, lat: Double, lon: Double, time: Long)
defined class UberMatched
Here we convert the map matched points into a dataframe and explore certain things about the matched points
// Create a dataframe to better explore the matched trajectories, make sure it is sensible
val matchTripsDF = matchTrips.map{case (id, points) =>
points.map(point => UberMatched(id, point._1._1, point._1._2, point._2 ))
}
.flatMap(uberMatched => uberMatched)
.toDF.cache
matchTripsDF: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [id: int, lat: double ... 2 more fields]
matchTripsDF.groupBy($"id").count.orderBy(-$"count").show(10)
+-----+-----+
| id|count|
+-----+-----+
|11721| 418|
|23602| 264|
| 3586| 250|
| 3719| 247|
| 5783| 225|
|10858| 217|
| 7092| 212|
|10842| 212|
|12734| 212|
| 1333| 208|
+-----+-----+
only showing top 10 rows
Finally it is helpful to be able to visualise the results of the map matching.
These next few steps take the map matched trips and convert them into json using the Spray-Json library. See here for documentation on the library.
To make the visualisation less clutterd only two trips will be selected. Though little would have to be done to extend this to multiple/all of the trajectories.
Here we select only those points that belong to the trip with id 11721 and id 10858, it is selected only because it contains the most points after map matching.
val filterTrips = matchTrips.filter{case (id, values) => id == 11721 || id == 10858 }.cache
filterTrips: org.apache.spark.rdd.RDD[(Integer, Array[((Double, Double), Long)])] = MapPartitionsRDD[115] at filter at command-2971213210274866:1
Next a schema for the json representation of a trajectory. Then the filtered trips are collected to the master and converted to strings of Json
// Convert our Uber data points into GeoJson Geometries
// Is not fully compliant with the spec but in a format that Leaflet understands
case class UberData(`type`: String = "MultiPoint",
coordinates: Array[(Double, Double)])
object UberJsonProtocol extends DefaultJsonProtocol {
implicit val uberDataFormat = jsonFormat2(UberData)
}
import UberJsonProtocol._
val mapMatchedTrajectories = filterTrips.collect.map{case (key, matchedPoints) => { // change filterTrips to matchTrip to get all matched trajectories as json
val jsonString = UberData(coordinates = matchedPoints.map{case ((lat, lon), time) => (lat, lon)}).toJson.prettyPrint
jsonString
}}
defined class UberData
defined object UberJsonProtocol
import UberJsonProtocol._
mapMatchedTrajectories: Array[String] =
Array({
"type": "MultiPoint",
"coordinates": [[-122.42245184084292, 37.77072625839843], [-122.42243656715236, 37.77058581495104], [-122.42240359833248, 37.77025929324908], [-122.4224008043647, 37.77022911839699], [-122.4223676492803, 37.76987353943006], [-122.42172391910233, 37.76991358630166], [-122.42160135704879, 37.76992345832117], [-122.4214882944857, 37.76993705563107], [-122.42136852639992, 37.76995251558615], [-122.42125322866261, 37.769969651921905], [-122.42113383310587, 37.76998883716737], [-122.42100940840713, 37.76997337721229], [-122.42095278399331, 37.769978406354305], [-122.42077732281635, 37.76999368004487], [-122.42028837845373, 37.77000932626447], [-122.42018705055536, 37.7700085812064], [-122.42019748136842, 37.7699178703856], [-122.4201181326833, 37.769236142245745], [-122.42008423254082, 37.76918175300617], [-122.4200527538371, 37.76923651477478], [-122.4200313334174, 37.76943265131338], [-122.42002574548182, 37.76953025392138], [-122.42001345202357, 37.769660266555704], [-122.41998607113926, 37.7699178703856], [-122.41996651336476, 37.770005973503125], [-122.41984357878216, 37.77000057183207], [-122.41926019830838, 37.769963877721814], [-122.41909088386053, 37.76994469247635], [-122.41891113859961, 37.7699178703856], [-122.41878541004922, 37.76989794008206], [-122.41829665195115, 37.76981821886789], [-122.41782111863392, 37.769741664150544], [-122.41772891769698, 37.76972694925354], [-122.41753948668106, 37.76969547054981], [-122.4173642117686, 37.76966864845906], [-122.41688439436743, 37.76960867128392], [-122.41674693115235, 37.76959824047085], [-122.41665435768637, 37.7695909761546], [-122.41619428432422, 37.769570859586544], [-122.41606762445124, 37.76956285021222], [-122.4155909735469, 37.76953304788917], [-122.41450244369736, 37.76950417688871], [-122.41378495276983, 37.76948741308199], [-122.41340273797667, 37.769474560830176], [-122.41220785108673, 37.76943842551347], [-122.41199159798008, 37.76943209251982], [-122.41131154622089, 37.76938534012553], [-122.41121822769682, 37.769376399428616], [-122.41115731919909, 37.769370625228525], [-122.41101464057746, 37.769354420215365], [-122.4109047445112, 37.76943488648761], [-122.41077063405746, 37.76954627267002], [-122.4104606898977, 37.76979866109338], [-122.41031279586954, 37.76991340003714], [-122.41022636913269, 37.76998045526401], [-122.40974413029278, 37.77035447441834], [-122.40962901881998, 37.770334357850274], [-122.40929392895015, 37.770282017520415], [-122.40906240215293, 37.77022297166786], [-122.40884540398818, 37.77013673119553], [-122.40854067523496, 37.76996927939287], [-122.40832609850897, 37.7698349826746], [-122.40775855051932, 37.769472139391425], [-122.40708874330868, 37.769007595680826], [-122.40683840379502, 37.768820027310106], [-122.4063531847228, 37.76846202690442], [-122.4059715287232, 37.76817015040301], [-122.40576030475856, 37.76796693581269], [-122.40560086233022, 37.767732987576714], [-122.40553455216143, 37.76757298635482], [-122.40549711299309, 37.767405162023124], [-122.405491338793, 37.76732115672502], [-122.40548221183155, 37.767276453240434], [-122.40541385275306, 37.76674075648354], [-122.40533245515822, 37.76631756349618], [-122.40525645923442, 37.76605064644033], [-122.40513762247124, 37.76481217365292], [-122.40511340808376, 37.764428096214566], [-122.40511191796762, 37.764387118020366], [-122.40511229049666, 37.764303298986775], [-122.40512979936145, 37.76398329654299], [-122.40516705226527, 37.76372625150665], [-122.40522740196946, 37.76346250094762], [-122.40535387557792, 37.763090530703], [-122.40559397054301, 37.76260307645656], [-122.4061734394619, 37.76165648017056], [-122.40629581525093, 37.76140763077306], [-122.40639081015566, 37.76115263464643], [-122.40645637526639, 37.76089186431971], [-122.40648915782174, 37.76062345714771], [-122.40649027540886, 37.760358775266084], [-122.4064604730858, 37.76009297579735], [-122.40635430230992, 37.759696232371695], [-122.40624906285665, 37.75945166705813], [-122.40603057457575, 37.75908621607169], [-122.40585082931483, 37.758859904680996], [-122.40563178224039, 37.758637691109726], [-122.405418136837, 37.75845440682294], [-122.4044393167892, 37.75773505325023], [-122.40410292306773, 37.757449137213435], [-122.40381235041795, 37.75713416391166], [-122.40364862390567, 37.75690058820472], [-122.40351767994875, 37.75665658168472], [-122.40338207937886, 37.75629262081443], [-122.40328447677086, 37.75577648183204], [-122.40304456807027, 37.753280164747245], [-122.40301904983116, 37.75275210483563], [-122.40302016741828, 37.75262414111102], [-122.40311255461974, 37.75222106469172], [-122.40330012299046, 37.75159651975922], [-122.40334724791379, 37.75146576206682], [-122.40350594528405, 37.75110049734489], [-122.40397365549148, 37.74993131495859], [-122.40405896464122, 37.749760324130065], [-122.40410981485493, 37.749678367741666], [-122.40420015314669, 37.74960479325663], [-122.40431377450334, 37.749536434178125], [-122.40456467281054, 37.749446282150885], [-122.40481277714996, 37.74940493142765], [-122.40491466384191, 37.7494112644213], [-122.40501748185643, 37.749430263402246], [-122.40516649347171, 37.7494883779322], [-122.40536486518454, 37.7495772261078], [-122.40538982463009, 37.74960702843086], [-122.40553064060653, 37.74984898604115], [-122.40555466872948, 37.74989443458381], [-122.4057696179845, 37.750304030261276], [-122.40605516149228, 37.75084438363115], [-122.40610843314472, 37.751026177801776], [-122.40614456846143, 37.75138157050419], [-122.4061715768167, 37.75163172375333], [-122.40617921366199, 37.751738453322766], [-122.406184242804, 37.75181538056915], [-122.40625129803087, 37.75252877367725], [-122.4062878058766, 37.752933153948184], [-122.4062905998444, 37.75300225808476], [-122.40641278936891, 37.752984004161895], [-122.40651244088663, 37.75296928926489], [-122.4072226674979, 37.75292011543185], [-122.40732176022206, 37.75291359617368], [-122.40740557925565, 37.752908380767146], [-122.40813294220268, 37.75286479486968], [-122.40821378100397, 37.75285976572766], [-122.40829014945679, 37.75285548164373], [-122.40903148224275, 37.752812454539814], [-122.40914864262525, 37.75280556275261], [-122.40927343985304, 37.75279829843637], [-122.41001421384546, 37.7527547125389], [-122.41008909218212, 37.752750242190444], [-122.41016657822206, 37.752745585577465], [-122.41090493077573, 37.7527010683574], [-122.41098223055114, 37.75269622547991], [-122.41105971659108, 37.75269138260241], [-122.41179974552541, 37.7526483554985], [-122.41189567175275, 37.75264202250485], [-122.41203499761302, 37.752632895543414], [-122.41225963262303, 37.75261780811737], [-122.41248482642662, 37.752603093220365], [-122.41289721607187, 37.752578133774804], [-122.41293372391762, 37.7525760848651], [-122.41301288633822, 37.752571241987596], [-122.41308403938451, 37.75256677163914], [-122.41311160653333, 37.75256528152299], [-122.4135172906559, 37.75254088087099], [-122.41380134404751, 37.75252374453523], [-122.41398928494728, 37.752512382399566], [-122.41410234751037, 37.75250567687688], [-122.41423422278987, 37.752498598825156], [-122.41510854844246, 37.75244514090818], [-122.41518864218567, 37.752440298030685], [-122.41528102938715, 37.75243452383059], [-122.41615107095579, 37.752382556029765], [-122.41628164238367, 37.752374546655446], [-122.41641891933423, 37.75236709607468], [-122.41644611395402, 37.752364674635935], [-122.41684248485065, 37.75234083277749], [-122.41700099595639, 37.75233133328702], [-122.41727815756079, 37.7523145694803], [-122.41737389752359, 37.75230898154472], [-122.41747615674457, 37.75230264855108], [-122.41789525191251, 37.75227768910552], [-122.4180444497923, 37.75226856214408], [-122.4181517381553, 37.75226204288592], [-122.41833707135179, 37.75225049448573], [-122.41846335869573, 37.752243416434005], [-122.41859653782687, 37.75223577958872], [-122.41864999574386, 37.75223279935642], [-122.41899514389772, 37.752213427846435], [-122.41903332812413, 37.752211192672206], [-122.41945596231794, 37.75218586069761], [-122.41956958367459, 37.75217971396848], [-122.41967482312786, 37.75217393976839], [-122.42008460506986, 37.752145813826004], [-122.42021722540746, 37.75213799071621], [-122.42044968352727, 37.75212383461275], [-122.4206655641049, 37.75211098236094], [-122.42096768515485, 37.752092728438065], [-122.42114370512539, 37.752082297624995], [-122.42159576911321, 37.75205491674069], [-122.42274613878308, 37.75198581260411], [-122.42283442816513, 37.75198059719757], [-122.4229268153666, 37.75197482299748], [-122.42305887691063, 37.7519671861522], [-122.42399448359001, 37.75191167932552], [-122.4245283177017, 37.75187908303467], [-122.42509512063329, 37.75184443783412], [-122.42524282839692, 37.75183568340173], [-122.42542350498043, 37.7518246937951], [-122.4255550077309, 37.751816870685296], [-122.42580981759302, 37.75180159699473], [-122.4263317307755, 37.751770118291006], [-122.42730943323619, 37.7517114449675], [-122.4274325540833, 37.75170418065125], [-122.42744857283193, 37.751703249328656], [-122.4274705520452, 37.75170194547702], [-122.42845365617693, 37.751642527095434], [-122.42854771975907, 37.75163693915986], [-122.42866134111571, 37.75163004737266], [-122.42952374583908, 37.75157826583635], [-122.42965878761542, 37.75157007019751], [-122.42982530809549, 37.751560011913476], [-122.43069106558019, 37.751508044112654], [-122.43085479209248, 37.75149817209314], [-122.43102913568234, 37.75148774128007], [-122.43173936229361, 37.7514450867052], [-122.43187682550871, 37.75143670480184], [-122.4320485613953, 37.751426087724255], [-122.43314174785782, 37.75135884623287], [-122.43372531459612, 37.75132289718068], [-122.43381379024268, 37.75131730924511], [-122.4341086469764, 37.75129924158676], [-122.43445211874959, 37.75127875248966], [-122.4358818851981, 37.751193443339915], [-122.43617022267364, 37.75117630700416], [-122.43630042157248, 37.75116848389436], [-122.43645744256207, 37.75115898440389], [-122.4383871429798, 37.75104089269878], [-122.43849163737502, 37.75103455970513], [-122.43850057807194, 37.751126574377565], [-122.43856800582785, 37.751832516904905], [-122.43864437428067, 37.75263028784015], [-122.4387216740561, 37.753435881885196], [-122.43873191860465, 37.75354372904175], [-122.43879766997988, 37.7542293687365], [-122.43890384075576, 37.75533708383151], [-122.43888428298125, 37.75541363854886], [-122.43902025608018, 37.75539706100666], [-122.4396468499224, 37.75531994749576], [-122.43996908754042, 37.755253451062444], [-122.44005495548372, 37.75522681523621], [-122.44018049776957, 37.75523631472669], [-122.44025556237077, 37.75527729292089], [-122.44029169768747, 37.755338946476705], [-122.44012536347192, 37.75564386149445], [-122.44010282546512, 37.75574537565735], [-122.4400860616584, 37.75581652870364], [-122.44008382648417, 37.75598342171274], [-122.44015367567883, 37.75641872189385], [-122.44026208162893, 37.75640996746145], [-122.44090711565853, 37.75625946573003], [-122.44103526564766, 37.756229477142455], [-122.44106059762225, 37.75627492568511], [-122.44112187864904, 37.75635855845418], [-122.44121538343762, 37.756444240132964], [-122.44141412767948, 37.756549852115285], [-122.4415674233787, 37.756590830309484], [-122.44202265386335, 37.756654346510494], [-122.44210684542597, 37.75667818836894], [-122.44217688088514, 37.756704451666124], [-122.44224374984749, 37.75673611663437], [-122.4423139715712, 37.756780075060874], [-122.44248309975453, 37.75692554765028], [-122.44293516374235, 37.75754711235047], [-122.44309535122876, 37.757728533992065], [-122.44316426910082, 37.75779689307057], [-122.4432460392247, 37.75787363405243], [-122.44332892693569, 37.75794571842132], [-122.44340939320794, 37.75801053847396], [-122.44347458578962, 37.75805915351344], [-122.44363905735997, 37.75816737319903], [-122.44366681077332, 37.75818618591546], [-122.44371896483867, 37.75821431185784], [-122.44376702108458, 37.75823852624532], [-122.44390988597073, 37.758302787504405], [-122.44451822589006, 37.75858255681207], [-122.44463631759515, 37.75867103245864], [-122.44473671417094, 37.75878297743461], [-122.44477601598447, 37.75884183702264], [-122.44480879853984, 37.758906657075286], [-122.44486784439239, 37.75916761366653], [-122.4448445613275, 37.75937958268925], [-122.44482053320453, 37.7594542747614], [-122.44479408364283, 37.75950903653001], [-122.44468958924762, 37.7596699690745], [-122.44464693467275, 37.75971746652687], [-122.44469797115097, 37.75970126151371], [-122.44479464243638, 37.759634392551355], [-122.44495222221953, 37.75936002491474], [-122.44499599438151, 37.759316997810835], [-122.44500102352353, 37.75924044309349], [-122.44499841582027, 37.759163515847106], [-122.44498891632979, 37.75908938256851], [-122.44497196625855, 37.75901599434799], [-122.44487902026353, 37.758804956647865], [-122.44477582971996, 37.75868388471046], [-122.44463929782746, 37.75857193973449], [-122.44448469827663, 37.75848383661696], [-122.44389088698978, 37.75821021403842], [-122.44382643946618, 37.75816979463778], [-122.44374783583912, 37.75811615045628], [-122.44356213011359, 37.75798408891225], [-122.44349786885451, 37.75793603266633], [-122.44342261798879, 37.75787586922666], [-122.44334289677462, 37.75780657882556], [-122.44326038159267, 37.75773207301793], [-122.44312887884219, 37.7576069032611], [-122.44297241664616, 37.75744373554238], [-122.44256282096869, 37.75688196175282], [-122.44244510179263, 37.75677020304136], [-122.44230652099043, 37.75667911969153], [-122.44223164265375, 37.756641121729636], [-122.4421433532717, 37.7566085254388], [-122.44196379427531, 37.75656382195422], [-122.44160039219857, 37.756517814618], [-122.44146106633829, 37.756474973778616], [-122.44134465101386, 37.75641015372597], [-122.44121985378608, 37.756303424156535], [-122.44115093591401, 37.75620265505171], [-122.44111182036501, 37.75607413253354], [-122.44111535939086, 37.755970941989965], [-122.44114665183008, 37.75585154643323], [-122.44120681526974, 37.75573867013466], [-122.44167303536102, 37.7551027630665], [-122.44191704188101, 37.754524970528294], [-122.44245217984435, 37.75379239217473], [-122.44253003841334, 37.753632204688316], [-122.44257902598186, 37.75346847817604], [-122.44267327582851, 37.75309091999585], [-122.44279043621101, 37.75262209220131], [-122.44285097217971, 37.752379575797455], [-122.44288189208989, 37.75225645495034], [-122.4430519515958, 37.751575099339526], [-122.4430664802283, 37.75151754360313], [-122.44338983543344, 37.7502222601374], [-122.44341144211765, 37.75013564713603], [-122.44343230374379, 37.750052386895995], [-122.44350159414489, 37.74977522529159], [-122.4435749823654, 37.74950029886142], [-122.4436636442765, 37.749304348587344], [-122.44415277490361, 37.74863696281545], [-122.44424609342768, 37.74845721755454], [-122.44427161166679, 37.748329998888], [-122.44427012155063, 37.74820035878272], [-122.4441691661813, 37.74768254341966], [-122.44418928274935, 37.74752719881074], [-122.44425745556335, 37.747374834434126], [-122.44429396340908, 37.74732715071724], [-122.44433289269357, 37.74728486867141], [-122.44444018105656, 37.74719136388283], [-122.44456050793589, 37.74711704433971], [-122.44468809913147, 37.74706898809379], [-122.44479911278485, 37.74704551876438], [-122.44487902026353, 37.747034156628715], [-122.44489075492824, 37.747117789397784], [-122.44492279242552, 37.7473433557304], [-122.4450084741043, 37.747946293978686], [-122.44501257192371, 37.74798112544375], [-122.44501517962698, 37.748013721734594], [-122.44501946371092, 37.74809158030357], [-122.44503306102081, 37.74830839220379], [-122.44503995280802, 37.74837339852095], [-122.4450563440857, 37.748503224890754], [-122.44507087271819, 37.74861815009903], [-122.44505131494368, 37.74878560190169], [-122.44488144170228, 37.74909815376471], [-122.4448533157599, 37.7492112163278], [-122.44484679650174, 37.74925498848978], [-122.44483059148857, 37.749368423581906], [-122.44492614518686, 37.74959305859193], [-122.44498053442643, 37.749668495722155], [-122.44528135162476, 37.74993429519089], [-122.44562314701729, 37.750238278886044], [-122.44574682665795, 37.75043348410205], [-122.44575558109035, 37.7505033332967], [-122.44575166953545, 37.750522704806684], [-122.44574515027729, 37.75054095872956], [-122.44544843089838, 37.75082389453405], [-122.4453169281479, 37.75089355746419], [-122.44528936099908, 37.75090119430947], [-122.44528805714745, 37.75093751589069], [-122.44510607671229, 37.75134431760038], [-122.44509694975086, 37.75137747268477], [-122.44508670520231, 37.75141751955638], [-122.44506584357617, 37.75165947716667], [-122.44510309648, 37.75186101537632], [-122.4451695929133, 37.752015428662645], [-122.44545010727904, 37.75251424504476], [-122.44547413540201, 37.752598250342864], [-122.44545588147915, 37.752727145390075], [-122.44527408730852, 37.753161700513104], [-122.44525415700497, 37.753356719464584], [-122.44525061797911, 37.75338354155534], [-122.44524633389517, 37.75340552076859], [-122.44513979059025, 37.75363816515293], [-122.44493564467733, 37.75390079812484], [-122.44490062694774, 37.75394196258355], [-122.44486858945046, 37.75398405836487], [-122.44471957783519, 37.75424352483996], [-122.4446718941183, 37.75438974248744], [-122.44464805225986, 37.75454378324473], [-122.44466183583428, 37.75478704470665], [-122.4447605560294, 37.75506811786595], [-122.44474695871949, 37.75521061022305], [-122.44468083481522, 37.75530877162461], [-122.44456628213598, 37.75540674676165], [-122.44441969195947, 37.7554950361437], [-122.44432525584828, 37.75551329006657], [-122.44423622140816, 37.75551235874397]]
}, {
"type": "MultiPoint",
"coordinates": [[-122.41189697560438, 37.772072950871426], [-122.4115702676379, 37.77181236680923], [-122.41128695930436, 37.7715866142121], [-122.4109047445112, 37.771281140400795], [-122.41059219264818, 37.77103489870656], [-122.41056462549935, 37.77101273322879], [-122.41011107139538, 37.77064635091975], [-122.40974413029278, 37.77035447441834], [-122.40962901881998, 37.770334357850274], [-122.40929392895015, 37.770282017520415], [-122.40906240215293, 37.77022297166786], [-122.40884540398818, 37.77013673119553], [-122.40854067523496, 37.76996927939287], [-122.40832609850897, 37.7698349826746], [-122.40775855051932, 37.769472139391425], [-122.40708874330868, 37.769007595680826], [-122.40683840379502, 37.768820027310106], [-122.4063531847228, 37.76846202690442], [-122.4059715287232, 37.76817015040301], [-122.40576030475856, 37.76796693581269], [-122.40560086233022, 37.767732987576714], [-122.40553455216143, 37.76757298635482], [-122.40549711299309, 37.767405162023124], [-122.405491338793, 37.76732115672502], [-122.40548221183155, 37.767276453240434], [-122.40541385275306, 37.76674075648354], [-122.40533245515822, 37.76631756349618], [-122.40525645923442, 37.76605064644033], [-122.40513762247124, 37.76481217365292], [-122.40511340808376, 37.764428096214566], [-122.40511191796762, 37.764387118020366], [-122.40511229049666, 37.764303298986775], [-122.40512979936145, 37.76398329654299], [-122.40516705226527, 37.76372625150665], [-122.40522740196946, 37.76346250094762], [-122.40535387557792, 37.763090530703], [-122.40559397054301, 37.76260307645656], [-122.4061734394619, 37.76165648017056], [-122.40629581525093, 37.76140763077306], [-122.40639081015566, 37.76115263464643], [-122.40645637526639, 37.76089186431971], [-122.40648915782174, 37.76062345714771], [-122.40649027540886, 37.760358775266084], [-122.4064604730858, 37.76009297579735], [-122.40635430230992, 37.759696232371695], [-122.40624906285665, 37.75945166705813], [-122.40603057457575, 37.75908621607169], [-122.40585082931483, 37.758859904680996], [-122.40563178224039, 37.758637691109726], [-122.405418136837, 37.75845440682294], [-122.4044393167892, 37.75773505325023], [-122.40410292306773, 37.757449137213435], [-122.40381235041795, 37.75713416391166], [-122.40364862390567, 37.75690058820472], [-122.40351767994875, 37.75665658168472], [-122.40338207937886, 37.75629262081443], [-122.40328447677086, 37.75577648183204], [-122.40304456807027, 37.753280164747245], [-122.40301904983116, 37.75275210483563], [-122.40302016741828, 37.75262414111102], [-122.40311255461974, 37.75222106469172], [-122.40330012299046, 37.75159651975922], [-122.40334724791379, 37.75146576206682], [-122.40347316272869, 37.751091370383456], [-122.40387642541252, 37.749896297229], [-122.40389225789664, 37.74986388720268], [-122.4039377064393, 37.74981415457608], [-122.40402860352461, 37.74973908997489], [-122.40416439035903, 37.749664397902734], [-122.40463563959231, 37.749497691158155], [-122.40476937751701, 37.749389285208046], [-122.40483326624707, 37.74926839953516], [-122.40483922671167, 37.74913950448795], [-122.40478558253018, 37.74901731496343], [-122.40468276451564, 37.74891375189082], [-122.4044149161372, 37.74871090982953], [-122.40412471601645, 37.74848776493567], [-122.4040509552669, 37.74841344539255], [-122.40398799785945, 37.74834452752049], [-122.40392485418748, 37.74825120899643], [-122.4038879738127, 37.74815118494968], [-122.40387288638665, 37.74812250021374], [-122.40381253668247, 37.74796361657896], [-122.40379316517249, 37.74780044886024], [-122.40382296749554, 37.74764249654805], [-122.40391088434855, 37.747388059214984], [-122.40431284318073, 37.746326537720705], [-122.40433333227783, 37.746106559323664], [-122.40432364652284, 37.74600411383817], [-122.40428974638037, 37.745946185572734], [-122.40423945496022, 37.74590185461719], [-122.40413365671337, 37.74586124895203], [-122.40404741624104, 37.74586571930049], [-122.40396136203321, 37.74589906064941], [-122.4038605929284, 37.74601007430278], [-122.40383228072149, 37.746066326187545], [-122.4036722794996, 37.74691699624621], [-122.40364229091202, 37.74708984971992], [-122.40366985806085, 37.74729269178121], [-122.40359535225322, 37.747486779410096], [-122.40360392042109, 37.74773451122048], [-122.40361826278907, 37.74788389536479], [-122.40362031169877, 37.74791481527496], [-122.40367246576412, 37.747960636346654], [-122.40376876452048, 37.7482245731702], [-122.40384587803139, 37.74835011545606], [-122.4040069968404, 37.74854867343341], [-122.40419028112717, 37.74872767363625], [-122.40459410260455, 37.74900744294392], [-122.40468965630285, 37.749071704203004], [-122.40472877185185, 37.74909796750019], [-122.40478297482692, 37.749135220404014], [-122.40497072946215, 37.749265046773814], [-122.40505249958602, 37.74932129865858], [-122.40513762247124, 37.74937978571757], [-122.40533990573898, 37.7495498452235], [-122.40536486518454, 37.7495772261078], [-122.40538982463009, 37.74960702843086], [-122.40553064060653, 37.74984898604115], [-122.40555466872948, 37.74989443458381], [-122.4057696179845, 37.750304030261276], [-122.40605516149228, 37.75084438363115], [-122.40610843314472, 37.751026177801776], [-122.40614456846143, 37.75138157050419], [-122.4061715768167, 37.75163172375333], [-122.40617921366199, 37.751738453322766], [-122.406184242804, 37.75181538056915], [-122.40625129803087, 37.75252877367725], [-122.4062878058766, 37.752933153948184], [-122.4062905998444, 37.75300225808476], [-122.40629525645737, 37.753083841944125], [-122.40632133349004, 37.75332524076086], [-122.40634275390974, 37.75356067911299], [-122.40640701516882, 37.75421390878142], [-122.4064146520141, 37.754292698673], [-122.4064418466339, 37.754581036148544], [-122.4064487384211, 37.754652189194836], [-122.40658862307494, 37.75610896399861], [-122.40659495606859, 37.75617564669644], [-122.4066009165332, 37.75624679974273], [-122.40664506122421, 37.75677113436396], [-122.40665139421786, 37.75684582643611], [-122.40666014865026, 37.75691567563077], [-122.40669702902504, 37.75721500271294], [-122.40670969501234, 37.75739307159319], [-122.40672552749646, 37.75756834650565], [-122.40673819348376, 37.75770785863045], [-122.40674471274193, 37.75778012926385], [-122.40675123200009, 37.75784550811005], [-122.40676855460038, 37.758021714345105], [-122.40687267646653, 37.759083794632936], [-122.40689204797653, 37.759306566997765], [-122.40689893976374, 37.75937604366338], [-122.40690434143478, 37.75945632367111], [-122.40699635610721, 37.76036808849204], [-122.40701814905594, 37.7605999878183], [-122.40702392325603, 37.7606692782194], [-122.40703193263036, 37.76074397029156], [-122.407047951379, 37.760907696803834], [-122.40708557681185, 37.76129475447449], [-122.40712096707048, 37.76165256861566], [-122.40714238749018, 37.76187478218693], [-122.40714797542574, 37.761942582471875], [-122.40715598480007, 37.762023980066715], [-122.40723514722069, 37.76293220586178], [-122.40725526378874, 37.76314249850383], [-122.40727817432459, 37.763173418414], [-122.40732250528013, 37.76321663178243], [-122.40734634713857, 37.7634701377929], [-122.407350444958, 37.76351465501296], [-122.4074109809267, 37.76414422908748], [-122.40741563753967, 37.76419284412696], [-122.40744394974658, 37.764486769538074], [-122.40746835039857, 37.76474102060663], [-122.40747058557281, 37.76476486246507], [-122.40747375206963, 37.76479727249139], [-122.40748399661818, 37.76490381579631], [-122.40749256478605, 37.76499229144288], [-122.40753056274795, 37.765388476074975], [-122.40754453258688, 37.76553432119342], [-122.407560365071, 37.76569935155733], [-122.40756781565177, 37.76578074915217], [-122.40757899152291, 37.76589288039266], [-122.40760078447165, 37.76611844672527], [-122.40766597705333, 37.766797380897344], [-122.40768460350523, 37.76701251641689], [-122.40769056396985, 37.76708162055347], [-122.40769857334416, 37.7671648807935], [-122.40771813111867, 37.767367722854786], [-122.40772204267357, 37.76740832851995], [-122.40778406875843, 37.768054480136655], [-122.40779133307467, 37.76812954473785], [-122.40781368481696, 37.76836181659315], [-122.40782504695262, 37.76849741716305], [-122.4078315662108, 37.76857434440943], [-122.40779561715861, 37.76870025922433], [-122.40780511664909, 37.76883567352971], [-122.40781834142994, 37.76896382351884], [-122.40782541948167, 37.769033486448976], [-122.40782933103657, 37.76907018055924], [-122.40783715414636, 37.76914561768947], [-122.40790141540545, 37.769241171387755], [-122.40777456926796, 37.76938627144813], [-122.4077466295901, 37.76941644630022], [-122.40768516229879, 37.76948294273353], [-122.40764213519489, 37.769530440185896], [-122.40708632186993, 37.76999554269006], [-122.40699765995885, 37.77006986223317], [-122.40690806672517, 37.77014213286658], [-122.40686299071155, 37.7701784544478], [-122.40630158945102, 37.77063107722918], [-122.40554386538737, 37.77122619236766], [-122.40544589025033, 37.77130330587856], [-122.40536691409424, 37.77136458690534], [-122.40535909098445, 37.77137091989899], [-122.40448457906733, 37.77205842223894], [-122.40401854524058, 37.77242480454798]]
})
Now the same is done except using the original trajectory rather than the map matched one.
val originalTraj = uberData.filter($"tripId" === 11721 || $"tripId" == 10858)
.select($"latlon").cache
originalTraj: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [latlon: array<double>]
// Convert our Uber data points into GeoJson Geometries
case class UberData(`type`: String = "MultiPoint",
coordinates: Array[(Double, Double)])
object UberJsonProtocol extends DefaultJsonProtocol {
implicit val uberDataFormat = jsonFormat2(UberData)
}
import UberJsonProtocol._
val originalLatLon = originalTraj
.map(r => r.getAs[scala.collection.mutable.WrappedArray[Double]]("latlon"))
.map(point => (point(0), point(1))).collect
val originalJson = UberData(coordinates = originalLatLon).toJson.prettyPrint // Original Unmatched trajectories
defined class UberData
defined object UberJsonProtocol
import UberJsonProtocol._
originalLatLon: Array[(Double, Double)] = Array((-122.418327,37.769662), (-122.420465,37.752152), (-122.440099,37.755275), (-122.418033,37.769377), (-122.420542,37.752137), (-122.44031,37.75535), (-122.41801,37.769061), (-122.420881,37.7521), (-122.440255,37.755554), (-122.418315,37.76886), (-122.421396,37.75206), (-122.440142,37.755828), (-122.418724,37.768883), (-122.421987,37.75203), (-122.440138,37.756193), (-122.419014,37.769147), (-122.422569,37.752005), (-122.440149,37.756371), (-122.418971,37.769483), (-122.422693,37.752004), (-122.440359,37.756411), (-122.418511,37.769687), (-122.422818,37.752014), (-122.44069,37.756341), (-122.417832,37.769659), (-122.423185,37.751983), (-122.44086,37.756313), (-122.41706,37.769579), (-122.423685,37.751946), (-122.441048,37.756333), (-122.416215,37.769531), (-122.424169,37.751918), (-122.441334,37.756555), (-122.415324,37.769522), (-122.424708,37.751885), (-122.441897,37.756684), (-122.414432,37.76951), (-122.424958,37.751875), (-122.442435,37.756889), (-122.413484,37.769487), (-122.424984,37.751852), (-122.442802,37.757329), (-122.412444,37.769453), (-122.425247,37.751841), (-122.443158,37.757806), (-122.411387,37.769401), (-122.425729,37.751814), (-122.443619,37.758213), (-122.410386,37.769268), (-122.426196,37.751781), (-122.444145,37.758475), (-122.409464,37.769109), (-122.426694,37.751751), (-122.444577,37.758679), (-122.408661,37.768924), (-122.427177,37.751718), (-122.444832,37.759028), (-122.407822,37.76878), (-122.427336,37.751713), (-122.444837,37.759426), (-122.406908,37.768626), (-122.427581,37.7517), (-122.444577,37.759657), (-122.406076,37.768222), (-122.428053,37.75167), (-122.444514,37.759751), (-122.405508,37.767584), (-122.428398,37.751655), (-122.444658,37.75975), (-122.405369,37.76681), (-122.428542,37.751655), (-122.444871,37.759528), (-122.405278,37.765996), (-122.428965,37.751632), (-122.444939,37.759096), (-122.405222,37.765188), (-122.429399,37.751602), (-122.444754,37.758632), (-122.405166,37.764413), (-122.429542,37.751587), (-122.444395,37.758382), (-122.405225,37.763661), (-122.429883,37.751555), (-122.444256,37.758327), (-122.405476,37.762902), (-122.430437,37.751511), (-122.444097,37.758272), (-122.405923,37.762129), (-122.431027,37.751473), (-122.443826,37.758148), (-122.406369,37.761326), (-122.431551,37.751459), (-122.443456,37.757896), (-122.40657,37.760493), (-122.43174,37.751459), (-122.443045,37.757527), (-122.406438,37.759641), (-122.432052,37.751442), (-122.442714,37.757076), (-122.405972,37.758826), (-122.432547,37.751401), (-122.442393,37.756685), (-122.405289,37.758195), (-122.433013,37.751368), (-122.441899,37.756513), (-122.404561,37.757664), (-122.433527,37.75134), (-122.441371,37.756363), (-122.403931,37.757064), (-122.434029,37.751315), (-122.441165,37.755966), (-122.403497,37.756271), (-122.434547,37.751281), (-122.441389,37.75552), (-122.403226,37.754705), (-122.435054,37.751253), (-122.441737,37.755033), (-122.403133,37.753789), (-122.435511,37.751228), (-122.442034,37.754477), (-122.403107,37.753565), (-122.435945,37.751211), (-122.44236,37.754006), (-122.403045,37.752721), (-122.436152,37.7512), (-122.442662,37.753557), (-122.403134,37.751899), (-122.436356,37.751184), (-122.442805,37.753054), (-122.403397,37.751196), (-122.436847,37.751152), (-122.442904,37.752566), (-122.40367,37.750529), (-122.437464,37.751113), (-122.443011,37.752035), (-122.403916,37.74996), (-122.438115,37.751083), (-122.443169,37.751483), (-122.404188,37.749513), (-122.438337,37.75107), (-122.44333,37.750914), (-122.40475,37.749263), (-122.4385,37.751162), (-122.443481,37.750366), (-122.405186,37.749328), (-122.438535,37.75151), (-122.443608,37.74989), (-122.405446,37.749605), (-122.438569,37.751751), (-122.443739,37.749409), (-122.405722,37.750095), (-122.438591,37.751967), (-122.443995,37.749005), (-122.405981,37.750691), (-122.438646,37.752356), (-122.444297,37.748614), (-122.406143,37.751261), (-122.438655,37.752555), (-122.444418,37.748159), (-122.406192,37.751811), (-122.438666,37.752747), (-122.444341,37.747639), (-122.406246,37.752386), (-122.438701,37.753118), (-122.444515,37.747228), (-122.406331,37.752796), (-122.438718,37.75333), (-122.444731,37.747076), (-122.406367,37.752952), (-122.438702,37.753511), (-122.444758,37.747066), (-122.406436,37.752996), (-122.438716,37.753871), (-122.444928,37.747193), (-122.406774,37.753001), (-122.438745,37.75411), (-122.444944,37.747476), (-122.407123,37.752941), (-122.438767,37.75422), (-122.444931,37.747834), (-122.407234,37.752931), (-122.43883,37.754559), (-122.444924,37.748234), (-122.407593,37.752915), (-122.438898,37.755016), (-122.444938,37.748669), (-122.407987,37.752886), (-122.438914,37.755296), (-122.444866,37.749047), (-122.408148,37.752862), (-122.438889,37.755366), (-122.444752,37.749373), (-122.408531,37.752836), (-122.438864,37.755555), (-122.444881,37.749725), (-122.408922,37.752817), (-122.438884,37.755852), (-122.445259,37.750076), (-122.408958,37.752814), (-122.438931,37.756233), (-122.445615,37.750375), (-122.409201,37.752817), (-122.438969,37.756521), (-122.445711,37.750517), (-122.409683,37.752807), (-122.43901,37.756796), (-122.445731,37.75057), (-122.409911,37.752785), (-122.439033,37.757037), (-122.445745,37.750608), (-122.410194,37.752723), (-122.439219,37.757126), (-122.445524,37.750736), (-122.410691,37.752689), (-122.439402,37.757132), (-122.445241,37.751019), (-122.410904,37.752679), (-122.439496,37.75712), (-122.445072,37.751368), (-122.411205,37.752666), (-122.439774,37.757185), (-122.445045,37.751731), (-122.411634,37.752642), (-122.440133,37.757251), (-122.445205,37.752141), (-122.411762,37.752633), (-122.440426,37.757365), (-122.445419,37.752573), (-122.422617,37.770549), (-122.411999,37.752599), (-122.440673,37.757582), (-122.445379,37.752959), (-122.422602,37.770505), (-122.412454,37.752544), (-122.440823,37.757866), (-122.445243,37.753398), (-122.42258,37.770472), (-122.412856,37.752522), (-122.440919,37.758086), (-122.445065,37.753805), (-122.422556,37.770377), (-122.412994,37.752521), (-122.440957,37.758141), (-122.444786,37.754166), (-122.4225,37.770249), (-122.413365,37.752504), (-122.440916,37.758093), (-122.44464,37.754517), (-122.422447,37.770132), (-122.413775,37.752444), (-122.440873,37.758018), (-122.444673,37.754914), (-122.422431,37.770048), (-122.413914,37.752431), (-122.440795,37.757945), (-122.444669,37.755273), (-122.422408,37.770006), (-122.414045,37.752478), (-122.44073,37.757726), (-122.444413,37.755485), (-122.422391,37.76999), (-122.414408,37.752469), (-122.440588,37.757431), (-122.444293,37.755533), (-122.422259,37.769866), (-122.41493,37.752428), (-122.440358,37.757253), (-122.42187,37.769849), (-122.415443,37.752393), (-122.439989,37.757148), (-122.4214,37.769876), (-122.415963,37.752366), (-122.439585,37.757045), (-122.421024,37.769912), (-122.416165,37.75235), (-122.439192,37.757043), (-122.420889,37.76993), (-122.416402,37.752351), (-122.439015,37.756936), (-122.420832,37.769942), (-122.41688,37.75233), (-122.438978,37.756708), (-122.420759,37.769902), (-122.417439,37.752294), (-122.438946,37.756414), (-122.420504,37.769874), (-122.417951,37.752258), (-122.438909,37.756055), (-122.420153,37.769848), (-122.418225,37.752245), (-122.438872,37.755653), (-122.419805,37.769819), (-122.41827,37.752243), (-122.438862,37.755536), (-122.419466,37.769781), (-122.418785,37.752229), (-122.438864,37.755472), (-122.418976,37.769726), (-122.419291,37.752216), (-122.438915,37.755413), (-122.418672,37.769685), (-122.419473,37.752208), (-122.43896,37.755403), (-122.418487,37.769666), (-122.419781,37.752187), (-122.439242,37.755366), (-122.418453,37.769667), (-122.420279,37.752162), (-122.439668,37.755331))
originalJson: String =
{
"type": "MultiPoint",
"coordinates": [[-122.418327, 37.769662], [-122.420465, 37.752152], [-122.440099, 37.755275], [-122.418033, 37.769377], [-122.420542, 37.752137], [-122.44031, 37.75535], [-122.41801, 37.769061], [-122.420881, 37.7521], [-122.440255, 37.755554], [-122.418315, 37.76886], [-122.421396, 37.75206], [-122.440142, 37.755828], [-122.418724, 37.768883], [-122.421987, 37.75203], [-122.440138, 37.756193], [-122.419014, 37.769147], [-122.422569, 37.752005], [-122.440149, 37.756371], [-122.418971, 37.769483], [-122.422693, 37.752004], [-122.440359, 37.756411], [-122.418511, 37.769687], [-122.422818, 37.752014], [-122.44069, 37.756341], [-122.417832, 37.769659], [-122.423185, 37.751983], [-122.44086, 37.756313], [-122.41706, 37.769579], [-122.423685, 37.751946], [-122.441048, 37.756333], [-122.416215, 37.769531], [-122.424169, 37.751918], [-122.441334, 37.756555], [-122.415324, 37.769522], [-122.424708, 37.751885], [-122.441897, 37.756684], [-122.414432, 37.76951], [-122.424958, 37.751875], [-122.442435, 37.756889], [-122.413484, 37.769487], [-122.424984, 37.751852], [-122.442802, 37.757329], [-122.412444, 37.769453], [-122.425247, 37.751841], [-122.443158, 37.757806], [-122.411387, 37.769401], [-122.425729, 37.751814], [-122.443619, 37.758213], [-122.410386, 37.769268], [-122.426196, 37.751781], [-122.444145, 37.758475], [-122.409464, 37.769109], [-122.426694, 37.751751], [-122.444577, 37.758679], [-122.408661, 37.768924], [-122.427177, 37.751718], [-122.444832, 37.759028], [-122.407822, 37.76878], [-122.427336, 37.751713], [-122.444837, 37.759426], [-122.406908, 37.768626], [-122.427581, 37.7517], [-122.444577, 37.759657], [-122.406076, 37.768222], [-122.428053, 37.75167], [-122.444514, 37.759751], [-122.405508, 37.767584], [-122.428398, 37.751655], [-122.444658, 37.75975], [-122.405369, 37.76681], [-122.428542, 37.751655], [-122.444871, 37.759528], [-122.405278, 37.765996], [-122.428965, 37.751632], [-122.444939, 37.759096], [-122.405222, 37.765188], [-122.429399, 37.751602], [-122.444754, 37.758632], [-122.405166, 37.764413], [-122.429542, 37.751587], [-122.444395, 37.758382], [-122.405225, 37.763661], [-122.429883, 37.751555], [-122.444256, 37.758327], [-122.405476, 37.762902], [-122.430437, 37.751511], [-122.444097, 37.758272], [-122.405923, 37.762129], [-122.431027, 37.751473], [-122.443826, 37.758148], [-122.406369, 37.761326], [-122.431551, 37.751459], [-122.443456, 37.757896], [-122.40657, 37.760493], [-122.43174, 37.751459], [-122.443045, 37.757527], [-122.406438, 37.759641], [-122.432052, 37.751442], [-122.442714, 37.757076], [-122.405972, 37.758826], [-122.432547, 37.751401], [-122.442393, 37.756685], [-122.405289, 37.758195], [-122.433013, 37.751368], [-122.441899, 37.756513], [-122.404561, 37.757664], [-122.433527, 37.75134], [-122.441371, 37.756363], [-122.403931, 37.757064], [-122.434029, 37.751315], [-122.441165, 37.755966], [-122.403497, 37.756271], [-122.434547, 37.751281], [-122.441389, 37.75552], [-122.403226, 37.754705], [-122.435054, 37.751253], [-122.441737, 37.755033], [-122.403133, 37.753789], [-122.435511, 37.751228], [-122.442034, 37.754477], [-122.403107, 37.753565], [-122.435945, 37.751211], [-122.44236, 37.754006], [-122.403045, 37.752721], [-122.436152, 37.7512], [-122.442662, 37.753557], [-122.403134, 37.751899], [-122.436356, 37.751184], [-122.442805, 37.753054], [-122.403397, 37.751196], [-122.436847, 37.751152], [-122.442904, 37.752566], [-122.40367, 37.750529], [-122.437464, 37.751113], [-122.443011, 37.752035], [-122.403916, 37.74996], [-122.438115, 37.751083], [-122.443169, 37.751483], [-122.404188, 37.749513], [-122.438337, 37.75107], [-122.44333, 37.750914], [-122.40475, 37.749263], [-122.4385, 37.751162], [-122.443481, 37.750366], [-122.405186, 37.749328], [-122.438535, 37.75151], [-122.443608, 37.74989], [-122.405446, 37.749605], [-122.438569, 37.751751], [-122.443739, 37.749409], [-122.405722, 37.750095], [-122.438591, 37.751967], [-122.443995, 37.749005], [-122.405981, 37.750691], [-122.438646, 37.752356], [-122.444297, 37.748614], [-122.406143, 37.751261], [-122.438655, 37.752555], [-122.444418, 37.748159], [-122.406192, 37.751811], [-122.438666, 37.752747], [-122.444341, 37.747639], [-122.406246, 37.752386], [-122.438701, 37.753118], [-122.444515, 37.747228], [-122.406331, 37.752796], [-122.438718, 37.75333], [-122.444731, 37.747076], [-122.406367, 37.752952], [-122.438702, 37.753511], [-122.444758, 37.747066], [-122.406436, 37.752996], [-122.438716, 37.753871], [-122.444928, 37.747193], [-122.406774, 37.753001], [-122.438745, 37.75411], [-122.444944, 37.747476], [-122.407123, 37.752941], [-122.438767, 37.75422], [-122.444931, 37.747834], [-122.407234, 37.752931], [-122.43883, 37.754559], [-122.444924, 37.748234], [-122.407593, 37.752915], [-122.438898, 37.755016], [-122.444938, 37.748669], [-122.407987, 37.752886], [-122.438914, 37.755296], [-122.444866, 37.749047], [-122.408148, 37.752862], [-122.438889, 37.755366], [-122.444752, 37.749373], [-122.408531, 37.752836], [-122.438864, 37.755555], [-122.444881, 37.749725], [-122.408922, 37.752817], [-122.438884, 37.755852], [-122.445259, 37.750076], [-122.408958, 37.752814], [-122.438931, 37.756233], [-122.445615, 37.750375], [-122.409201, 37.752817], [-122.438969, 37.756521], [-122.445711, 37.750517], [-122.409683, 37.752807], [-122.43901, 37.756796], [-122.445731, 37.75057], [-122.409911, 37.752785], [-122.439033, 37.757037], [-122.445745, 37.750608], [-122.410194, 37.752723], [-122.439219, 37.757126], [-122.445524, 37.750736], [-122.410691, 37.752689], [-122.439402, 37.757132], [-122.445241, 37.751019], [-122.410904, 37.752679], [-122.439496, 37.75712], [-122.445072, 37.751368], [-122.411205, 37.752666], [-122.439774, 37.757185], [-122.445045, 37.751731], [-122.411634, 37.752642], [-122.440133, 37.757251], [-122.445205, 37.752141], [-122.411762, 37.752633], [-122.440426, 37.757365], [-122.445419, 37.752573], [-122.422617, 37.770549], [-122.411999, 37.752599], [-122.440673, 37.757582], [-122.445379, 37.752959], [-122.422602, 37.770505], [-122.412454, 37.752544], [-122.440823, 37.757866], [-122.445243, 37.753398], [-122.42258, 37.770472], [-122.412856, 37.752522], [-122.440919, 37.758086], [-122.445065, 37.753805], [-122.422556, 37.770377], [-122.412994, 37.752521], [-122.440957, 37.758141], [-122.444786, 37.754166], [-122.4225, 37.770249], [-122.413365, 37.752504], [-122.440916, 37.758093], [-122.44464, 37.754517], [-122.422447, 37.770132], [-122.413775, 37.752444], [-122.440873, 37.758018], [-122.444673, 37.754914], [-122.422431, 37.770048], [-122.413914, 37.752431], [-122.440795, 37.757945], [-122.444669, 37.755273], [-122.422408, 37.770006], [-122.414045, 37.752478], [-122.44073, 37.757726], [-122.444413, 37.755485], [-122.422391, 37.76999], [-122.414408, 37.752469], [-122.440588, 37.757431], [-122.444293, 37.755533], [-122.422259, 37.769866], [-122.41493, 37.752428], [-122.440358, 37.757253], [-122.42187, 37.769849], [-122.415443, 37.752393], [-122.439989, 37.757148], [-122.4214, 37.769876], [-122.415963, 37.752366], [-122.439585, 37.757045], [-122.421024, 37.769912], [-122.416165, 37.75235], [-122.439192, 37.757043], [-122.420889, 37.76993], [-122.416402, 37.752351], [-122.439015, 37.756936], [-122.420832, 37.769942], [-122.41688, 37.75233], [-122.438978, 37.756708], [-122.420759, 37.769902], [-122.417439, 37.752294], [-122.438946, 37.756414], [-122.420504, 37.769874], [-122.417951, 37.752258], [-122.438909, 37.756055], [-122.420153, 37.769848], [-122.418225, 37.752245], [-122.438872, 37.755653], [-122.419805, 37.769819], [-122.41827, 37.752243], [-122.438862, 37.755536], [-122.419466, 37.769781], [-122.418785, 37.752229], [-122.438864, 37.755472], [-122.418976, 37.769726], [-122.419291, 37.752216], [-122.438915, 37.755413], [-122.418672, 37.769685], [-122.419473, 37.752208], [-122.43896, 37.755403], [-122.418487, 37.769666], [-122.419781, 37.752187], [-122.439242, 37.755366], [-122.418453, 37.769667], [-122.420279, 37.752162], [-122.439668, 37.755331]]
}
- Display result of a map-matched trajectory
val trajHTML = genLeafletHTML(mapMatchedTrajectories ++ Array(originalJson))
displayHTML(trajHTML) // zoom and play - orange dots are raw and azure dots are map-matched
Visualization & MapMatching (Further things one could do).
-
Show Direction of Travel
-
Get timestamp for points, currently Graphhopper map matching does no preserve this information.
-
Map the matched coordinates to OSM Way Ids. See here to extract OSM ids from the graphhopper graph edges with GraphHopper, does however require 0.6 SNAPSHOT for it to work.
Another potential way to do this is just to reverse geocode with something such as http://nominatim.openstreetmap.org/
Step 0.1: Loading our OSM Data
(Only needs to be done once per OSM Map)
See https://download.bbbike.org/osm/bbbike/SanFrancisco/ to download the pbf format of the OSM.
#curl -O https://download.bbbike.org/osm/bbbike/SanFrancisco/SanFrancisco.osm.pbf
The below osm.pbf file was downloaded deom the above link as Marina pointed out yesterday and received via email:
your requested OpenStreetMap area 'San Francisco' was extracted from planet.osm
To download the file, please click on the following link:
https://download.bbbike.org/osm/extract/planet_-122.529,37.724_-122.352,37.811.osm.pbf
The file will be available for the next 48 hours. Please download the
file as soon as possible.
Name: San Francisco
Coordinates: -122.529,37.724 x -122.352,37.811
Script URL: https://extract.bbbike.org/?sw_lng=-122.529&sw_lat=37.724&ne_lng=-122.352&ne_lat=37.811&format=osm.pbf&city=San%20Francisco
Square kilometre: 150
Granularity: 100 (1.1 cm)
Format: osm.pbf
File size: 8.5 MB
SHA256 checksum: 8fe277a3b23ebd5a612d21cc50a5287bae3a169867c631353e9a1da3963cd617
MD5 checksum: 9d2c5650547623bbca1656db84efeb7d
Last planet.osm database update: Thu May 3 05:46:46 2018 UTC
License: OpenStreetMap License
Please read the extract online help for more informations:
https://extract.bbbike.org/extract.html
and the much smaller map has these details:
your requested OpenStreetMap area 'San Francisco' was extracted from planet.osm
To download the file, please click on the following link:
https://download.bbbike.org/osm/extract/planet_-122.449,37.747_-122.397,37.772.osm.pbf
The file will be available for the next 48 hours. Please download the
file as soon as possible.
Name: San Francisco
Coordinates: -122.449,37.747 x -122.397,37.772
Script URL: https://extract.bbbike.org/?sw_lng=-122.449&sw_lat=37.747&ne_lng=-122.397&ne_lat=37.772&format=osm.pbf&city=San%20Francisco
Square kilometre: 12
Granularity: 100 (1.1 cm)
Format: osm.pbf
File size: 1.3 MB
SHA256 checksum: 4fa2c4137e9eabdacc840ebcd9f741470c617c43d4d852d528e1baa44d2fb190
MD5 checksum: 38f2954459efa8d95f65a16f844adebf
# smaller SF osm.pbf file as the driver crashes with the above larger map
curl -O https://download.bbbike.org/osm/bbbike/SanFrancisco/SanFrancisco.osm.pbf # nearly 17MB and too big for community edition...
#curl -O https://download.bbbike.org/osm/extract/planet_-122.529,37.724_-122.352,37.811.osm.pbf
#curl -O https://download.bbbike.org/osm/extract/planet_-122.449,37.747_-122.397,37.772.osm.gz # much smaller map of SF
# backups in progress here... http://lamastex.org/.../SanFrancisco_-122.529_37.724__-122.352_37.811.osm.pbf
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 20.0M 100 20.0M 0 0 22.8M 0 --:--:-- --:--:-- --:--:-- 22.7M
ls
SanFrancisco.osm.pbf
conf
derby.log
eventlogs
logs
dbutils.fs.mkdirs("dbfs:/files/graphhopper/osm/")
res1: Boolean = true
dbutils.fs.rm("dbfs:/datasets/graphhopper/osm/SanFrancisco.osm.pbf",recurse=true) // to remove any pre-existing file with same name in dbfs
res2: Boolean = false
dbutils.fs.mv("file:/databricks/driver/SanFrancisco.osm.pbf", "dbfs:/files/graphhopper/osm/SanFrancisco.osm.pbf") // too big for driver memory
//dbutils.fs.mv("file:/databricks/driver/planet_-122.529,37.724_-122.352,37.811.osm.pbf", "dbfs:/datasets/graphhopper/osm/SanFrancisco.osm.pbf")
//dbutils.fs.mv("file:/databricks/driver/planet_-122.449,37.747_-122.397,37.772.osm.gz", "dbfs:/files/graphhopper/osm/SanFranciscoSmall.osm.gz")
res3: Boolean = true
display(dbutils.fs.ls("dbfs:/files/graphhopper/osm"))
| path | name | size |
|---|---|---|
| dbfs:/files/graphhopper/osm/SanFrancisco.osm.pbf | SanFrancisco.osm.pbf | 2.1059693e7 |
dbutils.fs.mkdirs("dbfs:/files/graphhopper/graphHopperData") // Where graphhopper will store its data
res5: Boolean = true
Process an OSM file, creating from it a GraphHopper Graph. The contents of this graph are then stored in the distributed filesystem to be accessed for later use. This ensures that the processing step only takes place once, and subsequent GraphHopper objects can simply read these files to start map matching.
val osmPath = "/dbfs/files/graphhopper/osm/SanFrancisco.osm.pbf"
val graphHopperPath = "/dbfs/files/graphhopper/graphHopperData"
osmPath: String = /dbfs/files/graphhopper/osm/SanFrancisco.osm.pbf
graphHopperPath: String = /dbfs/files/graphhopper/graphHopperData
val encoder = new CarFlagEncoder()
val hopper = new GraphHopper()
.setStoreOnFlush(true)
.setEncodingManager(new EncodingManager(encoder))
.setOSMFile(osmPath)
.setCHWeightings("shortest")
.setGraphHopperLocation("graphhopper/")
hopper.importOrLoad()
encoder: com.graphhopper.routing.util.CarFlagEncoder = car
hopper: com.graphhopper.GraphHopper = com.graphhopper.GraphHopper@5d5ac119
res6: com.graphhopper.GraphHopper = com.graphhopper.GraphHopper@5d5ac119
Move the GraphHopper object to dbfs:
dbutils.fs.mv("file:/databricks/driver/graphhopper", "dbfs:/files/graphhopper/graphHopperData", recurse=true)
res7: Boolean = true
display(dbutils.fs.ls("dbfs:/files/graphhopper/graphHopperData"))
| path | name | size |
|---|---|---|
| dbfs:/files/graphhopper/graphHopperData/edges | edges | 3145828.0 |
| dbfs:/files/graphhopper/graphHopperData/geometry | geometry | 1048676.0 |
| dbfs:/files/graphhopper/graphHopperData/location_index | location_index | 1048676.0 |
| dbfs:/files/graphhopper/graphHopperData/names | names | 1048676.0 |
| dbfs:/files/graphhopper/graphHopperData/nodes | nodes | 1048676.0 |
| dbfs:/files/graphhopper/graphHopperData/nodes_ch_shortest_car | nodes_ch_shortest_car | 1048676.0 |
| dbfs:/files/graphhopper/graphHopperData/properties | properties | 32868.0 |
| dbfs:/files/graphhopper/graphHopperData/shortcuts_shortest_car | shortcuts_shortest_car | 3145828.0 |
This notebook is originally from: (link not working)
- https://cdn2.hubspot.net/hubfs/438089/notebooks/MobileSample.html
You can download a tiny sample dataset from here:
wget http://lamastex.org/datasets/public/geospatial/misc/mobile_sample.csv
The main purpose is to show how SQL can be used for geospatial data at the resolution of countries.
Mobile Sample Data (Sample)
This notebook contains various chart examples based on a sample mobile phone dataset. * Note, this dataset joins the mobile sample table and the country codes. * Notice that the country names do not match completely hence the use of the case statement within the join.
Mobile Devices by Geography (Sample Data)
This is a world map of number of mobile phones by country from a sample dataset
Loading Data:
From http://lamastex.org/datasets/public/geospatial/misc/mobile_sample.csv
(Only needs to be done once per cluster)
wget http://lamastex.org/datasets/public/geospatial/misc/mobile_sample.csv
--2022-02-02 16:28:20-- http://lamastex.org/datasets/public/geospatial/misc/mobile_sample.csv
Resolving lamastex.org (lamastex.org)... 166.62.28.100
Connecting to lamastex.org (lamastex.org)|166.62.28.100|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1713 (1.7K) [text/csv]
Saving to: ‘mobile_sample.csv’
0K . 100% 175M=0s
2022-02-02 16:28:20 (175 MB/s) - ‘mobile_sample.csv’ saved [1713/1713]
pwd
/databricks/driver
dbutils.fs.mkdirs("dbfs:/datasets/mobile_sample")
res0: Boolean = true
dbutils.fs.cp("file:/databricks/driver/mobile_sample.csv", "dbfs:/datasets/mobile_sample/") // load into dbfs
res1: Boolean = true
display(dbutils.fs.ls("dbfs:/datasets/mobile_sample/"))
| path | name | size |
|---|---|---|
| dbfs:/datasets/mobile_sample/mobile_sample.csv | mobile_sample.csv | 1713.0 |
Create SQL tables for each dataset.
CREATE TABLE mobile_sample USING com.databricks.spark.csv OPTIONS(path 'dbfs:/datasets/mobile_sample/mobile_sample.csv', header "true")
select * from mobile_sample
| CountryCode3 | Apple | HTC | ASUS | LG | DELL | Huawei | FujitsuToshibaMobileCommun | Archos | Casio | Kyocera |
|---|---|---|---|---|---|---|---|---|---|---|
| ARE | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ARG | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| AUS | 20 | 27 | 0 | 8 | 0 | 0 | 0 | 0 | 0 | 0 |
| AUT | 1 | 4 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| BEL | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| BGD | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| BHS | 25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| BMU | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| BRA | 6 | 3 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| BRN | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| CAN | 1 | 12 | 0 | 46 | 1 | 0 | 0 | 0 | 0 | 0 |
| CHE | 10 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| CHN | 0 | 58 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| CYP | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| CZE | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| DEU | 1 | 20 | 0 | 22 | 0 | 0 | 0 | 0 | 0 | 0 |
| DNK | 1 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| EGY | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| ESP | 0 | 18 | 0 | 11 | 0 | 0 | 0 | 0 | 0 | 0 |
| ETH | 4 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 |
| FIN | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| FJI | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| FRA | 4 | 32 | 0 | 8 | 0 | 0 | 0 | 0 | 0 | 0 |
| GGY | 26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| GIB | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| GRC | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| GUM | 13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| HKG | 0 | 1 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 |
| HTI | 28 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| HUN | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| IDN | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| IND | 9 | 14 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| IRL | 0 | 8 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| ITA | 1 | 3 | 0 | 21 | 0 | 0 | 0 | 0 | 0 | 0 |
| JAM | 2 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| JPN | 5 | 6 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 |
| KAZ | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| KHM | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| LCA | 13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| LUX | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| LVA | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| MAR | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| MEX | 0 | 0 | 0 | 16 | 0 | 0 | 0 | 0 | 0 | 0 |
| MLT | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| MMR | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| MTQ | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| MUS | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| MYS | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| NGA | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| NLD | 4 | 6 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| NOR | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| NPL | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| NZL | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| PAK | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| PHL | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| POL | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| RUS | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| SGP | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| SRB | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| SWE | 1 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
| THA | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| TUR | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| UKR | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| USA | 21004 | 2554 | 42 | 7940 | 52 | 229 | 0 | 1 | 996 | 117 |
| VNM | 0 | 5 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| ZAF | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Next cell doesn't work: - Mobile_sample table does not contain ClientID, DeviceMake or Country columns. - Data to create country codes tables is needed.
select m.ClientID, c.CountryCode3, m.DeviceMake
from mobile_sample m
join countrycodes c
on m.Country = c.Country
cache table mobile_sample
select DeviceMake, count(1) as DeviceCnt from mobile_sample where Country = 'United States' group by DeviceMake order by DeviceCnt desc limit 10
select m.clientid, s.StateCodes from mobile_sample m join state_codes s on s.state = m.state
select m.clientid, m.DeviceMake, s.StateCodes from mobile_sample m join state_codes s on s.state = m.state
| clientid | DeviceMake | StateCodes |
|---|---|---|
| 4688.0 | RIM | VA |
| 4688.0 | RIM | VA |
| 4688.0 | RIM | VA |
| 5251.0 | Apple | VA |
| 6056.0 | Samsung | VA |
| 7130.0 | SAMSUNG | VA |
| 7162.0 | Samsung | VA |
| 7162.0 | Samsung | VA |
| 9530.0 | HTC | VA |
| 11561.0 | Apple | VA |
| 13511.0 | Apple | VA |
| 14090.0 | Apple | VA |
| 16420.0 | RIM | VA |
| 16495.0 | Apple | VA |
| 16495.0 | Apple | VA |
| 16495.0 | Apple | VA |
| 16495.0 | Apple | VA |
| 16495.0 | Apple | VA |
| 16495.0 | Apple | VA |
| 16665.0 | LG | VA |
| 16665.0 | LG | VA |
| 17100.0 | Apple | VA |
| 17100.0 | Apple | VA |
| 17100.0 | Apple | VA |
| 17100.0 | Apple | VA |
| 17100.0 | Apple | VA |
| 18803.0 | Apple | VA |
| 18803.0 | Apple | VA |
| 18803.0 | Apple | VA |
| 18803.0 | Apple | VA |
| 18803.0 | Apple | VA |
| 18803.0 | Apple | VA |
| 18803.0 | Apple | VA |
| 18855.0 | HTC | VA |
| 18855.0 | HTC | VA |
| 18855.0 | HTC | VA |
| 18855.0 | HTC | VA |
| 18855.0 | HTC | VA |
| 18855.0 | HTC | VA |
| 18855.0 | HTC | VA |
| 18855.0 | HTC | VA |
| 18855.0 | HTC | VA |
| 18855.0 | HTC | VA |
| 20058.0 | SAMSUNG | VA |
| 21120.0 | HTC | VA |
| 21120.0 | HTC | VA |
| 21120.0 | HTC | VA |
| 21120.0 | HTC | VA |
| 21120.0 | HTC | VA |
| 21120.0 | HTC | VA |
| 21120.0 | HTC | VA |
| 21120.0 | HTC | VA |
| 23227.0 | Apple | VA |
| 23386.0 | Apple | VA |
| 23386.0 | Apple | VA |
| 23386.0 | Apple | VA |
| 23386.0 | Apple | VA |
| 23386.0 | Apple | VA |
| 23386.0 | Apple | VA |
| 23386.0 | Apple | VA |
| 25174.0 | Apple | VA |
| 26483.0 | Motorola | VA |
| 26483.0 | Motorola | VA |
| 26844.0 | Motorola | VA |
| 27613.0 | Apple | VA |
| 27616.0 | Samsung | VA |
| 27616.0 | Samsung | VA |
| 28703.0 | Apple | VA |
| 34264.0 | HTC | VA |
| 34409.0 | Apple | VA |
| 34409.0 | Apple | VA |
| 34409.0 | Apple | VA |
| 34409.0 | Apple | VA |
| 38897.0 | Apple | VA |
| 41623.0 | Samsung | VA |
| 41623.0 | Samsung | VA |
| 41623.0 | Samsung | VA |
| 41623.0 | Samsung | VA |
| 41994.0 | LG | VA |
| 41994.0 | LG | VA |
| 42108.0 | Apple | VA |
| 42108.0 | Apple | VA |
| 42108.0 | Apple | VA |
| 42108.0 | Apple | VA |
| 42108.0 | Apple | VA |
| 43885.0 | Apple | VA |
| 44680.0 | Apple | VA |
| 45525.0 | Unknown | VA |
| 45525.0 | Unknown | VA |
| 46999.0 | Samsung | VA |
| 46999.0 | Samsung | VA |
| 46999.0 | Samsung | VA |
| 47088.0 | Unknown | VA |
| 50378.0 | HTC | VA |
| 50378.0 | HTC | VA |
| 50378.0 | HTC | VA |
| 50378.0 | HTC | VA |
| 50378.0 | HTC | VA |
| 50378.0 | HTC | VA |
| 50378.0 | HTC | VA |
| 50378.0 | HTC | VA |
| 50378.0 | HTC | VA |
| 50523.0 | Samsung | VA |
| 50523.0 | Samsung | VA |
| 50523.0 | Samsung | VA |
| 50523.0 | Samsung | VA |
| 50523.0 | Samsung | VA |
| 55259.0 | Apple | VA |
| 55259.0 | Apple | VA |
| 55259.0 | Apple | VA |
| 55259.0 | Apple | VA |
| 55259.0 | Apple | VA |
| 55259.0 | Apple | VA |
| 55958.0 | Apple | VA |
| 55958.0 | Apple | VA |
| 55958.0 | Apple | VA |
| 55958.0 | Apple | VA |
| 55958.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 56836.0 | Apple | VA |
| 57116.0 | Apple | VA |
| 57116.0 | Apple | VA |
| 57116.0 | Apple | VA |
| 57116.0 | Apple | VA |
| 57116.0 | Apple | VA |
| 57116.0 | Apple | VA |
| 57116.0 | Apple | VA |
| 57116.0 | Apple | VA |
| 57116.0 | Apple | VA |
| 57116.0 | Apple | VA |
| 57116.0 | Apple | VA |
| 57116.0 | Apple | VA |
| 57116.0 | Apple | VA |
| 57116.0 | Apple | VA |
| 57116.0 | Apple | VA |
| 57116.0 | Apple | VA |
| 58197.0 | Apple | VA |
| 58197.0 | Apple | VA |
| 58197.0 | Apple | VA |
| 58197.0 | Apple | VA |
| 58197.0 | Apple | VA |
| 58197.0 | Apple | VA |
| 58197.0 | Apple | VA |
| 58197.0 | Apple | VA |
| 59178.0 | Apple | VA |
| 59178.0 | Apple | VA |
| 59178.0 | Apple | VA |
| 59178.0 | Apple | VA |
| 59178.0 | Apple | VA |
| 59178.0 | Apple | VA |
| 63752.0 | Apple | VA |
| 63752.0 | Apple | VA |
| 63752.0 | Apple | VA |
| 63752.0 | Apple | VA |
| 66119.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70552.0 | Apple | VA |
| 70675.0 | Samsung | VA |
| 70675.0 | Samsung | VA |
| 71722.0 | Apple | VA |
| 71722.0 | Apple | VA |
| 73079.0 | Samsung | VA |
| 73130.0 | Apple | VA |
| 73240.0 | Apple | VA |
| 74962.0 | Apple | VA |
| 75052.0 | Apple | VA |
| 76722.0 | Apple | VA |
| 76784.0 | Apple | VA |
| 76784.0 | Apple | VA |
| 76784.0 | Apple | VA |
| 78472.0 | Apple | VA |
| 78472.0 | Apple | VA |
| 78472.0 | Apple | VA |
| 80120.0 | Unknown | VA |
| 81230.0 | Apple | VA |
| 84397.0 | Apple | VA |
| 84397.0 | Apple | VA |
| 84397.0 | Apple | VA |
| 84397.0 | Apple | VA |
| 84648.0 | RIM | VA |
| 84648.0 | RIM | VA |
| 84648.0 | RIM | VA |
| 84648.0 | RIM | VA |
| 84648.0 | RIM | VA |
| 84648.0 | RIM | VA |
| 84648.0 | RIM | VA |
| 84648.0 | RIM | VA |
| 84648.0 | RIM | VA |
| 84648.0 | RIM | VA |
| 84680.0 | Apple | VA |
| 84680.0 | Apple | VA |
| 86418.0 | Apple | VA |
| 86418.0 | Apple | VA |
| 86418.0 | Apple | VA |
| 86418.0 | Apple | VA |
| 86565.0 | Apple | VA |
| 86812.0 | Apple | VA |
| 88516.0 | Apple | VA |
| 88704.0 | Apple | VA |
| 88704.0 | Apple | VA |
| 88704.0 | Apple | VA |
| 88704.0 | Apple | VA |
| 88704.0 | Apple | VA |
| 88704.0 | Apple | VA |
| 88704.0 | Apple | VA |
| 88704.0 | Apple | VA |
| 88704.0 | Apple | VA |
| 88704.0 | Apple | VA |
| 88704.0 | Apple | VA |
| 88704.0 | Apple | VA |
| 88704.0 | Apple | VA |
| 88704.0 | Apple | VA |
| 88704.0 | Apple | VA |
| 88704.0 | Apple | VA |
| 88704.0 | Apple | VA |
| 88704.0 | Apple | VA |
| 91487.0 | Apple | VA |
| 91487.0 | Apple | VA |
| 91487.0 | Apple | VA |
| 91487.0 | Apple | VA |
| 91487.0 | Apple | VA |
| 92114.0 | Apple | VA |
| 92114.0 | Apple | VA |
| 92114.0 | Apple | VA |
| 92114.0 | Apple | VA |
| 92114.0 | Apple | VA |
| 92114.0 | Apple | VA |
| 92114.0 | Apple | VA |
| 92114.0 | Apple | VA |
| 92877.0 | Unknown | VA |
| 92877.0 | Unknown | VA |
| 92877.0 | Unknown | VA |
| 92877.0 | Unknown | VA |
| 93948.0 | Samsung | VA |
| 93948.0 | Samsung | VA |
| 93948.0 | Samsung | VA |
| 93948.0 | Samsung | VA |
| 93948.0 | Samsung | VA |
| 93948.0 | Samsung | VA |
| 93948.0 | Samsung | VA |
| 93948.0 | Samsung | VA |
| 93948.0 | Samsung | VA |
| 93948.0 | Samsung | VA |
| 94406.0 | RIM | VA |
| 94406.0 | RIM | VA |
| 94544.0 | RIM | VA |
| 95285.0 | Motorola | VA |
| 95285.0 | Motorola | VA |
| 96482.0 | Apple | VA |
| 97377.0 | Apple | VA |
| 99963.0 | Apple | VA |
| 100246.0 | Unknown | VA |
| 101106.0 | Apple | VA |
| 101346.0 | HTC | VA |
| 101699.0 | LG | VA |
| 102673.0 | Apple | VA |
| 102673.0 | Apple | VA |
| 102673.0 | Apple | VA |
| 103769.0 | Apple | VA |
| 103769.0 | Apple | VA |
| 103769.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 103816.0 | Apple | VA |
| 105828.0 | Unknown | VA |
| 106628.0 | Apple | VA |
| 106665.0 | Apple | VA |
| 110539.0 | Apple | VA |
| 110539.0 | Apple | VA |
| 110539.0 | Apple | VA |
| 110539.0 | Apple | VA |
| 110539.0 | Apple | VA |
| 110539.0 | Apple | VA |
| 111832.0 | Samsung | VA |
| 111832.0 | Samsung | VA |
| 115313.0 | Samsung | VA |
| 116832.0 | Apple | VA |
| 119953.0 | Apple | VA |
| 119953.0 | Apple | VA |
| 119953.0 | Apple | VA |
| 119953.0 | Apple | VA |
| 121290.0 | Samsung | VA |
| 124122.0 | Apple | VA |
| 124122.0 | Apple | VA |
| 124122.0 | Apple | VA |
| 124122.0 | Apple | VA |
| 124122.0 | Apple | VA |
| 124293.0 | RIM | VA |
| 124293.0 | RIM | VA |
| 124293.0 | RIM | VA |
| 126664.0 | Apple | VA |
| 127802.0 | Apple | VA |
| 128989.0 | Apple | VA |
| 128989.0 | Apple | VA |
| 128989.0 | Apple | VA |
| 129236.0 | Motorola | VA |
| 129378.0 | Apple | VA |
| 130696.0 | Apple | VA |
| 130696.0 | Apple | VA |
| 130696.0 | Apple | VA |
| 130696.0 | Apple | VA |
| 130696.0 | Apple | VA |
| 131735.0 | LG | VA |
| 131735.0 | LG | VA |
| 134020.0 | Samsung | VA |
| 134020.0 | Samsung | VA |
| 134020.0 | Samsung | VA |
| 134020.0 | Samsung | VA |
| 134020.0 | Samsung | VA |
| 134020.0 | Samsung | VA |
| 134020.0 | Samsung | VA |
| 134020.0 | Samsung | VA |
| 134020.0 | Samsung | VA |
| 134020.0 | Samsung | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 135176.0 | Apple | VA |
| 137908.0 | Apple | VA |
| 139839.0 | Apple | VA |
| 139839.0 | Apple | VA |
| 140058.0 | Apple | VA |
| 140058.0 | Apple | VA |
| 685.0 | Apple | FL |
| 781.0 | Apple | FL |
| 1156.0 | LG | FL |
| 1156.0 | LG | FL |
| 1156.0 | LG | FL |
| 1156.0 | LG | FL |
| 1371.0 | Apple | FL |
| 1371.0 | Apple | FL |
| 1371.0 | Apple | FL |
| 1371.0 | Apple | FL |
| 1371.0 | Apple | FL |
| 1371.0 | Apple | FL |
| 1371.0 | Apple | FL |
| 1371.0 | Apple | FL |
| 1371.0 | Apple | FL |
| 1371.0 | Apple | FL |
| 1371.0 | Apple | FL |
| 1371.0 | Apple | FL |
| 1371.0 | Apple | FL |
| 1371.0 | Apple | FL |
| 1371.0 | Apple | FL |
| 1775.0 | LG | FL |
| 1775.0 | LG | FL |
| 1775.0 | LG | FL |
| 1775.0 | LG | FL |
| 1775.0 | LG | FL |
| 1935.0 | Samsung | FL |
| 1935.0 | Samsung | FL |
| 2614.0 | Apple | FL |
| 2614.0 | Apple | FL |
| 3253.0 | Samsung | FL |
| 3253.0 | Samsung | FL |
| 3488.0 | Samsung | FL |
| 3488.0 | Samsung | FL |
| 3488.0 | Samsung | FL |
| 3584.0 | HTC | FL |
| 3584.0 | HTC | FL |
| 3724.0 | Samsung | FL |
| 3724.0 | Samsung | FL |
| 3724.0 | Samsung | FL |
| 3825.0 | Apple | FL |
| 4142.0 | Samsung | FL |
| 4142.0 | Samsung | FL |
| 4605.0 | Apple | FL |
| 4605.0 | Apple | FL |
| 4605.0 | Apple | FL |
| 4605.0 | Apple | FL |
| 4605.0 | Apple | FL |
| 4605.0 | Apple | FL |
| 4605.0 | Apple | FL |
| 4605.0 | Apple | FL |
| 4605.0 | Apple | FL |
| 4605.0 | Apple | FL |
| 4725.0 | Apple | FL |
| 4734.0 | Apple | FL |
| 4734.0 | Apple | FL |
| 4734.0 | Apple | FL |
| 4829.0 | LG | FL |
| 4829.0 | LG | FL |
| 4829.0 | LG | FL |
| 4829.0 | LG | FL |
| 4829.0 | LG | FL |
| 4829.0 | LG | FL |
| 4829.0 | LG | FL |
| 4829.0 | LG | FL |
| 5180.0 | Apple | FL |
| 5180.0 | Apple | FL |
| 5180.0 | Apple | FL |
| 5180.0 | Apple | FL |
| 5300.0 | HTC | FL |
| 5300.0 | HTC | FL |
| 5377.0 | LG | FL |
| 5377.0 | LG | FL |
| 5377.0 | LG | FL |
| 5377.0 | LG | FL |
| 5377.0 | LG | FL |
| 5377.0 | LG | FL |
| 5377.0 | LG | FL |
| 5377.0 | LG | FL |
| 5377.0 | LG | FL |
| 5377.0 | LG | FL |
| 5377.0 | LG | FL |
| 5377.0 | LG | FL |
| 5647.0 | Unknown | FL |
| 5647.0 | Unknown | FL |
| 5647.0 | Unknown | FL |
| 5702.0 | Apple | FL |
| 5702.0 | Apple | FL |
| 6082.0 | Apple | FL |
| 6107.0 | Apple | FL |
| 6107.0 | Apple | FL |
| 6107.0 | Apple | FL |
| 6107.0 | Apple | FL |
| 6107.0 | Apple | FL |
| 6107.0 | Apple | FL |
| 6107.0 | Apple | FL |
| 6107.0 | Apple | FL |
| 6266.0 | LG | FL |
| 6266.0 | LG | FL |
| 6266.0 | LG | FL |
| 6285.0 | LG | FL |
| 6285.0 | LG | FL |
| 6285.0 | LG | FL |
| 6285.0 | LG | FL |
| 6285.0 | LG | FL |
| 6827.0 | Apple | FL |
| 7244.0 | Unknown | FL |
| 7244.0 | Unknown | FL |
| 7244.0 | Unknown | FL |
| 7817.0 | HTC | FL |
| 8226.0 | LG | FL |
| 8226.0 | LG | FL |
| 8233.0 | Apple | FL |
| 8604.0 | LG | FL |
| 8604.0 | LG | FL |
| 8772.0 | LG | FL |
| 8902.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9023.0 | Apple | FL |
| 9379.0 | HTC | FL |
| 10621.0 | Apple | FL |
| 10713.0 | Apple | FL |
| 10961.0 | Samsung | FL |
| 10961.0 | Samsung | FL |
| 10961.0 | Samsung | FL |
| 11102.0 | Samsung | FL |
| 11418.0 | LG | FL |
| 11890.0 | Samsung | FL |
| 12106.0 | Apple | FL |
| 12576.0 | Samsung | FL |
| 12576.0 | Samsung | FL |
| 12576.0 | Samsung | FL |
| 12576.0 | Samsung | FL |
| 12576.0 | Samsung | FL |
| 12576.0 | Samsung | FL |
| 12576.0 | Samsung | FL |
| 12576.0 | Samsung | FL |
| 12576.0 | Samsung | FL |
| 12576.0 | Samsung | FL |
| 12576.0 | Samsung | FL |
| 12576.0 | Samsung | FL |
| 12576.0 | Samsung | FL |
| 12576.0 | Samsung | FL |
| 12576.0 | Samsung | FL |
| 12576.0 | Samsung | FL |
| 12576.0 | Samsung | FL |
| 13568.0 | Samsung | FL |
| 13568.0 | Samsung | FL |
| 13687.0 | RIM | FL |
| 13816.0 | Samsung | FL |
| 13816.0 | Samsung | FL |
| 13816.0 | Samsung | FL |
| 13816.0 | Samsung | FL |
| 13816.0 | Samsung | FL |
| 13816.0 | Samsung | FL |
| 13816.0 | Samsung | FL |
| 13816.0 | Samsung | FL |
| 13816.0 | Samsung | FL |
| 13816.0 | Samsung | FL |
| 13816.0 | Samsung | FL |
| 13816.0 | Samsung | FL |
| 13816.0 | Samsung | FL |
| 13997.0 | Apple | FL |
| 13997.0 | Apple | FL |
| 13997.0 | Apple | FL |
| 14044.0 | Huawei | FL |
| 14095.0 | LG | FL |
| 14095.0 | LG | FL |
| 14184.0 | Apple | FL |
| 14495.0 | Apple | FL |
| 15063.0 | Samsung | FL |
| 15063.0 | Samsung | FL |
| 15063.0 | Samsung | FL |
| 15063.0 | Samsung | FL |
| 15362.0 | LG | FL |
| 15362.0 | LG | FL |
| 15362.0 | LG | FL |
| 15362.0 | LG | FL |
| 15362.0 | LG | FL |
| 16316.0 | Apple | FL |
| 16316.0 | Apple | FL |
| 16316.0 | Apple | FL |
| 16316.0 | Apple | FL |
| 16316.0 | Apple | FL |
| 16316.0 | Apple | FL |
| 16317.0 | LG | FL |
| 16317.0 | LG | FL |
| 16317.0 | LG | FL |
| 16855.0 | Apple | FL |
| 18146.0 | LG | FL |
| 18452.0 | Samsung | FL |
| 18452.0 | Samsung | FL |
| 18452.0 | Samsung | FL |
| 18455.0 | Motorola | FL |
| 18455.0 | Motorola | FL |
| 18455.0 | Motorola | FL |
| 18455.0 | Motorola | FL |
| 18455.0 | Motorola | FL |
| 18455.0 | Motorola | FL |
| 18494.0 | LG | FL |
| 18554.0 | LG | FL |
| 18554.0 | LG | FL |
| 18554.0 | LG | FL |
| 18875.0 | Samsung | FL |
| 18875.0 | Samsung | FL |
| 18875.0 | Samsung | FL |
| 18875.0 | Samsung | FL |
| 18875.0 | Samsung | FL |
| 18875.0 | Samsung | FL |
| 18951.0 | LG | FL |
| 18951.0 | LG | FL |
| 18951.0 | LG | FL |
| 18951.0 | LG | FL |
| 19599.0 | Motorola | FL |
| 19599.0 | Motorola | FL |
| 19599.0 | Motorola | FL |
| 20180.0 | Samsung | FL |
| 20180.0 | Samsung | FL |
| 20180.0 | Samsung | FL |
| 20180.0 | Samsung | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20898.0 | Apple | FL |
| 20964.0 | Apple | FL |
| 20984.0 | Apple | FL |
| 21383.0 | LG | FL |
| 22502.0 | LG | FL |
| 22502.0 | LG | FL |
| 22545.0 | Samsung | FL |
| 22545.0 | Samsung | FL |
| 22805.0 | LG | FL |
| 22805.0 | LG | FL |
| 23541.0 | LG | FL |
| 23541.0 | LG | FL |
| 23688.0 | HTC | FL |
| 23799.0 | HTC | FL |
| 23799.0 | HTC | FL |
| 23799.0 | HTC | FL |
| 23799.0 | HTC | FL |
| 24094.0 | Samsung | FL |
| 24094.0 | Samsung | FL |
| 24094.0 | Samsung | FL |
| 24094.0 | Samsung | FL |
| 24094.0 | Samsung | FL |
| 24094.0 | Samsung | FL |
| 24299.0 | Samsung | FL |
| 24299.0 | Samsung | FL |
| 24299.0 | Samsung | FL |
| 24305.0 | Motorola | FL |
| 24779.0 | Samsung | FL |
| 24779.0 | Samsung | FL |
| 24779.0 | Samsung | FL |
| 25359.0 | Apple | FL |
| 25703.0 | LG | FL |
| 25703.0 | LG | FL |
| 26452.0 | Apple | FL |
| 26452.0 | Apple | FL |
| 26494.0 | Samsung | FL |
| 26494.0 | Samsung | FL |
| 27143.0 | Samsung | FL |
| 27143.0 | Samsung | FL |
| 27143.0 | Samsung | FL |
| 27143.0 | Samsung | FL |
| 27143.0 | Samsung | FL |
| 27143.0 | Samsung | FL |
| 27143.0 | Samsung | FL |
| 27143.0 | Samsung | FL |
| 27326.0 | HTC | FL |
| 27586.0 | LG | FL |
| 27586.0 | LG | FL |
| 27586.0 | LG | FL |
| 27586.0 | LG | FL |
| 27586.0 | LG | FL |
| 27586.0 | LG | FL |
| 27586.0 | LG | FL |
| 27586.0 | LG | FL |
| 27614.0 | Apple | FL |
| 27614.0 | Apple | FL |
| 27614.0 | Apple | FL |
| 27614.0 | Apple | FL |
| 27614.0 | Apple | FL |
| 27614.0 | Apple | FL |
| 27741.0 | Apple | FL |
| 27741.0 | Apple | FL |
| 28057.0 | Samsung | FL |
| 28057.0 | Samsung | FL |
| 28351.0 | Apple | FL |
| 28585.0 | Samsung | FL |
| 28585.0 | Samsung | FL |
| 28722.0 | LG | FL |
| 28722.0 | LG | FL |
| 28722.0 | LG | FL |
| 28722.0 | LG | FL |
| 28722.0 | LG | FL |
| 28722.0 | LG | FL |
| 28911.0 | LG | FL |
| 28911.0 | LG | FL |
| 28934.0 | Unknown | FL |
| 29106.0 | Samsung | FL |
| 29106.0 | Samsung | FL |
| 29106.0 | Samsung | FL |
| 29106.0 | Samsung | FL |
| 29106.0 | Samsung | FL |
| 29206.0 | Samsung | FL |
| 29206.0 | Samsung | FL |
| 29244.0 | Apple | FL |
| 29244.0 | Apple | FL |
| 29244.0 | Apple | FL |
| 29244.0 | Apple | FL |
| 29581.0 | HTC | FL |
| 29585.0 | Apple | FL |
| 29613.0 | Apple | FL |
| 29613.0 | Apple | FL |
| 29613.0 | Apple | FL |
| 29613.0 | Apple | FL |
| 29613.0 | Apple | FL |
| 29613.0 | Apple | FL |
| 29613.0 | Apple | FL |
| 29613.0 | Apple | FL |
| 29613.0 | Apple | FL |
| 29613.0 | Apple | FL |
| 29745.0 | LG | FL |
| 29745.0 | LG | FL |
| 29745.0 | LG | FL |
| 29745.0 | LG | FL |
| 29745.0 | LG | FL |
| 29745.0 | LG | FL |
| 29745.0 | LG | FL |
| 29745.0 | LG | FL |
| 29745.0 | LG | FL |
| 29745.0 | LG | FL |
| 29745.0 | LG | FL |
| 29745.0 | LG | FL |
| 29745.0 | LG | FL |
| 29745.0 | LG | FL |
| 29814.0 | Samsung | FL |
| 29814.0 | Samsung | FL |
| 29827.0 | LG | FL |
| 29827.0 | LG | FL |
| 30123.0 | Samsung | FL |
| 30123.0 | Samsung | FL |
| 30123.0 | Samsung | FL |
| 30123.0 | Samsung | FL |
| 30462.0 | Motorola | FL |
| 30462.0 | Motorola | FL |
| 30462.0 | Motorola | FL |
| 30556.0 | LG | FL |
| 30556.0 | LG | FL |
| 30607.0 | Samsung | FL |
| 30607.0 | Samsung | FL |
| 30607.0 | Samsung | FL |
| 30607.0 | Samsung | FL |
| 30607.0 | Samsung | FL |
| 30619.0 | Samsung | FL |
| 30619.0 | Samsung | FL |
| 30998.0 | Apple | FL |
| 31123.0 | Samsung | FL |
| 31123.0 | Samsung | FL |
| 31171.0 | LG | FL |
| 31171.0 | LG | FL |
| 31222.0 | Apple | FL |
| 31480.0 | Apple | FL |
| 31615.0 | Motorola | FL |
| 32053.0 | Samsung | FL |
| 32053.0 | Samsung | FL |
| 32053.0 | Samsung | FL |
| 32053.0 | Samsung | FL |
| 32053.0 | Samsung | FL |
| 32053.0 | Samsung | FL |
| 32053.0 | Samsung | FL |
| 32053.0 | Samsung | FL |
| 32053.0 | Samsung | FL |
| 32053.0 | Samsung | FL |
| 32053.0 | Samsung | FL |
| 32053.0 | Samsung | FL |
| 32053.0 | Samsung | FL |
| 32185.0 | Apple | FL |
| 32394.0 | LG | FL |
| 32394.0 | LG | FL |
| 32394.0 | LG | FL |
| 32394.0 | LG | FL |
| 32394.0 | LG | FL |
| 32394.0 | LG | FL |
| 32465.0 | Unknown | FL |
| 32497.0 | Apple | FL |
| 32497.0 | Apple | FL |
| 32497.0 | Apple | FL |
| 32497.0 | Apple | FL |
| 32504.0 | Samsung | FL |
| 32584.0 | HTC | FL |
| 32821.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 32831.0 | Apple | FL |
| 33224.0 | Samsung | FL |
| 34772.0 | Samsung | FL |
| 34772.0 | Samsung | FL |
| 35162.0 | Apple | FL |
| 35797.0 | Samsung | FL |
| 35797.0 | Samsung | FL |
| 35797.0 | Samsung | FL |
| 35797.0 | Samsung | FL |
| 35797.0 | Samsung | FL |
| 35797.0 | Samsung | FL |
| 35797.0 | Samsung | FL |
| 35797.0 | Samsung | FL |
| 35797.0 | Samsung | FL |
| 35797.0 | Samsung | FL |
| 35797.0 | Samsung | FL |
| 35797.0 | Samsung | FL |
| 35871.0 | LG | FL |
| 36105.0 | Apple | FL |
| 36315.0 | Apple | FL |
| 36515.0 | Samsung | FL |
| 36515.0 | Samsung | FL |
| 36515.0 | Samsung | FL |
| 36622.0 | Samsung | FL |
| 36622.0 | Samsung | FL |
| 36622.0 | Samsung | FL |
| 36622.0 | Samsung | FL |
| 36622.0 | Samsung | FL |
| 36622.0 | Samsung | FL |
| 36622.0 | Samsung | FL |
| 36622.0 | Samsung | FL |
| 36622.0 | Samsung | FL |
| 36622.0 | Samsung | FL |
| 36622.0 | Samsung | FL |
| 36641.0 | Samsung | FL |
| 36641.0 | Samsung | FL |
| 36641.0 | Samsung | FL |
| 36641.0 | Samsung | FL |
| 36641.0 | Samsung | FL |
| 36641.0 | Samsung | FL |
| 36641.0 | Samsung | FL |
| 36641.0 | Samsung | FL |
| 36641.0 | Samsung | FL |
| 36641.0 | Samsung | FL |
| 36641.0 | Samsung | FL |
| 37254.0 | Samsung | FL |
| 37254.0 | Samsung | FL |
| 37254.0 | Samsung | FL |
| 37254.0 | Samsung | FL |
| 37254.0 | Samsung | FL |
| 37254.0 | Samsung | FL |
| 37254.0 | Samsung | FL |
| 37254.0 | Samsung | FL |
| 37254.0 | Samsung | FL |
| 37499.0 | LG | FL |
| 37499.0 | LG | FL |
| 37499.0 | LG | FL |
| 37499.0 | LG | FL |
| 37499.0 | LG | FL |
| 38136.0 | Unknown | FL |
| 38136.0 | Unknown | FL |
| 38136.0 | Unknown | FL |
| 38136.0 | Unknown | FL |
| 38136.0 | Unknown | FL |
| 39026.0 | Samsung | FL |
| 39026.0 | Samsung | FL |
| 39099.0 | Apple | FL |
| 39101.0 | Samsung | FL |
| 39391.0 | Samsung | FL |
| 39391.0 | Samsung | FL |
| 39391.0 | Samsung | FL |
| 39391.0 | Samsung | FL |
| 39391.0 | Samsung | FL |
| 39391.0 | Samsung | FL |
| 39391.0 | Samsung | FL |
| 39391.0 | Samsung | FL |
| 39494.0 | Apple | FL |
| 39657.0 | Apple | FL |
| 39657.0 | Apple | FL |
| 39657.0 | Apple | FL |
| 39657.0 | Apple | FL |
| 39657.0 | Apple | FL |
| 39657.0 | Apple | FL |
| 39963.0 | Apple | FL |
| 39963.0 | Apple | FL |
| 39963.0 | Apple | FL |
| 39963.0 | Apple | FL |
| 39963.0 | Apple | FL |
| 40023.0 | Apple | FL |
| 40082.0 | Samsung | FL |
| 40082.0 | Samsung | FL |
| 40082.0 | Samsung | FL |
| 40082.0 | Samsung | FL |
| 40082.0 | Samsung | FL |
| 40082.0 | Samsung | FL |
| 40082.0 | Samsung | FL |
| 40082.0 | Samsung | FL |
| 40082.0 | Samsung | FL |
| 40082.0 | Samsung | FL |
| 40082.0 | Samsung | FL |
| 40082.0 | Samsung | FL |
| 40082.0 | Samsung | FL |
| 40082.0 | Samsung | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40198.0 | Apple | FL |
| 40217.0 | LG | FL |
| 40217.0 | LG | FL |
| 40217.0 | LG | FL |
| 40217.0 | LG | FL |
| 40217.0 | LG | FL |
| 40217.0 | LG | FL |
| 40320.0 | Apple | FL |
| 40320.0 | Apple | FL |
| 40320.0 | Apple | FL |
| 40320.0 | Apple | FL |
| 40320.0 | Apple | FL |
| 40320.0 | Apple | FL |
| 40320.0 | Apple | FL |
| 40320.0 | Apple | FL |
| 40320.0 | Apple | FL |
| 40320.0 | Apple | FL |
| 40320.0 | Apple | FL |
| 40320.0 | Apple | FL |
select clientid, DeviceMake from mobile_sample where Country = 'United States' AND DeviceMake IN ('Apple', 'Samsung', 'LG', 'RIM', 'HTC', 'Motorola');
| clientid | DeviceMake |
|---|---|
| 8.0 | Samsung |
| 23.0 | HTC |
| 23.0 | HTC |
| 23.0 | HTC |
| 28.0 | Motorola |
| 28.0 | Motorola |
| 28.0 | Motorola |
| 28.0 | Motorola |
| 28.0 | Motorola |
| 28.0 | Motorola |
| 30.0 | RIM |
| 30.0 | RIM |
| 30.0 | RIM |
| 43.0 | RIM |
| 43.0 | RIM |
| 45.0 | Samsung |
| 45.0 | Samsung |
| 45.0 | Samsung |
| 49.0 | LG |
| 59.0 | LG |
| 59.0 | LG |
| 59.0 | LG |
| 62.0 | LG |
| 62.0 | LG |
| 62.0 | LG |
| 62.0 | LG |
| 67.0 | Apple |
| 77.0 | LG |
| 77.0 | LG |
| 77.0 | LG |
| 77.0 | LG |
| 77.0 | LG |
| 89.0 | Samsung |
| 89.0 | Samsung |
| 89.0 | Samsung |
| 89.0 | Samsung |
| 93.0 | LG |
| 93.0 | LG |
| 93.0 | LG |
| 93.0 | LG |
| 93.0 | LG |
| 93.0 | LG |
| 93.0 | LG |
| 109.0 | RIM |
| 109.0 | RIM |
| 114.0 | Samsung |
| 114.0 | Samsung |
| 114.0 | Samsung |
| 114.0 | Samsung |
| 141.0 | RIM |
| 146.0 | Apple |
| 146.0 | Apple |
| 156.0 | Apple |
| 156.0 | Apple |
| 156.0 | Apple |
| 156.0 | Apple |
| 156.0 | Apple |
| 156.0 | Apple |
| 181.0 | HTC |
| 182.0 | LG |
| 182.0 | LG |
| 186.0 | LG |
| 186.0 | LG |
| 186.0 | LG |
| 186.0 | LG |
| 186.0 | LG |
| 186.0 | LG |
| 186.0 | LG |
| 186.0 | LG |
| 186.0 | LG |
| 186.0 | LG |
| 186.0 | LG |
| 186.0 | LG |
| 186.0 | LG |
| 186.0 | LG |
| 186.0 | LG |
| 186.0 | LG |
| 186.0 | LG |
| 186.0 | LG |
| 186.0 | LG |
| 188.0 | Apple |
| 193.0 | RIM |
| 193.0 | RIM |
| 193.0 | RIM |
| 212.0 | Samsung |
| 212.0 | Samsung |
| 212.0 | Samsung |
| 212.0 | Samsung |
| 212.0 | Samsung |
| 212.0 | Samsung |
| 212.0 | Samsung |
| 212.0 | Samsung |
| 212.0 | Samsung |
| 212.0 | Samsung |
| 212.0 | Samsung |
| 212.0 | Samsung |
| 246.0 | Samsung |
| 246.0 | Samsung |
| 246.0 | Samsung |
| 246.0 | Samsung |
| 246.0 | Samsung |
| 246.0 | Samsung |
| 246.0 | Samsung |
| 246.0 | Samsung |
| 246.0 | Samsung |
| 246.0 | Samsung |
| 246.0 | Samsung |
| 246.0 | Samsung |
| 250.0 | Apple |
| 253.0 | RIM |
| 253.0 | RIM |
| 279.0 | Apple |
| 286.0 | Samsung |
| 286.0 | Samsung |
| 286.0 | Samsung |
| 286.0 | Samsung |
| 286.0 | Samsung |
| 286.0 | Samsung |
| 286.0 | Samsung |
| 286.0 | Samsung |
| 286.0 | Samsung |
| 286.0 | Samsung |
| 286.0 | Samsung |
| 286.0 | Samsung |
| 302.0 | Samsung |
| 302.0 | Samsung |
| 302.0 | Samsung |
| 302.0 | Samsung |
| 302.0 | Samsung |
| 302.0 | Samsung |
| 307.0 | Apple |
| 307.0 | Apple |
| 307.0 | Apple |
| 309.0 | Apple |
| 309.0 | Apple |
| 309.0 | Apple |
| 309.0 | Apple |
| 309.0 | Apple |
| 309.0 | Apple |
| 309.0 | Apple |
| 309.0 | Apple |
| 309.0 | Apple |
| 309.0 | Apple |
| 309.0 | Apple |
| 309.0 | Apple |
| 309.0 | Apple |
| 320.0 | LG |
| 347.0 | Apple |
| 350.0 | Apple |
| 362.0 | Apple |
| 362.0 | Apple |
| 362.0 | Apple |
| 362.0 | Apple |
| 362.0 | Apple |
| 362.0 | Apple |
| 362.0 | Apple |
| 362.0 | Apple |
| 362.0 | Apple |
| 362.0 | Apple |
| 362.0 | Apple |
| 362.0 | Apple |
| 362.0 | Apple |
| 382.0 | Motorola |
| 382.0 | Motorola |
| 387.0 | Apple |
| 387.0 | Apple |
| 387.0 | Apple |
| 387.0 | Apple |
| 387.0 | Apple |
| 387.0 | Apple |
| 387.0 | Apple |
| 387.0 | Apple |
| 387.0 | Apple |
| 387.0 | Apple |
| 387.0 | Apple |
| 387.0 | Apple |
| 387.0 | Apple |
| 387.0 | Apple |
| 387.0 | Apple |
| 387.0 | Apple |
| 387.0 | Apple |
| 387.0 | Apple |
| 396.0 | Apple |
| 396.0 | Apple |
| 400.0 | Samsung |
| 400.0 | Samsung |
| 400.0 | Samsung |
| 400.0 | Samsung |
| 400.0 | Samsung |
| 400.0 | Samsung |
| 416.0 | Samsung |
| 416.0 | Samsung |
| 419.0 | Samsung |
| 419.0 | Samsung |
| 419.0 | Samsung |
| 419.0 | Samsung |
| 419.0 | Samsung |
| 419.0 | Samsung |
| 422.0 | Apple |
| 424.0 | Apple |
| 438.0 | Samsung |
| 438.0 | Samsung |
| 462.0 | Samsung |
| 462.0 | Samsung |
| 462.0 | Samsung |
| 462.0 | Samsung |
| 462.0 | Samsung |
| 462.0 | Samsung |
| 462.0 | Samsung |
| 470.0 | LG |
| 470.0 | LG |
| 470.0 | LG |
| 472.0 | RIM |
| 472.0 | RIM |
| 472.0 | RIM |
| 488.0 | Apple |
| 488.0 | Apple |
| 488.0 | Apple |
| 488.0 | Apple |
| 488.0 | Apple |
| 488.0 | Apple |
| 506.0 | RIM |
| 506.0 | RIM |
| 506.0 | RIM |
| 509.0 | Apple |
| 509.0 | Apple |
| 509.0 | Apple |
| 515.0 | Apple |
| 543.0 | Samsung |
| 543.0 | Samsung |
| 543.0 | Samsung |
| 543.0 | Samsung |
| 543.0 | Samsung |
| 543.0 | Samsung |
| 543.0 | Samsung |
| 543.0 | Samsung |
| 543.0 | Samsung |
| 543.0 | Samsung |
| 543.0 | Samsung |
| 543.0 | Samsung |
| 543.0 | Samsung |
| 543.0 | Samsung |
| 543.0 | Samsung |
| 555.0 | LG |
| 555.0 | LG |
| 573.0 | Apple |
| 598.0 | Samsung |
| 598.0 | Samsung |
| 605.0 | Apple |
| 625.0 | RIM |
| 629.0 | LG |
| 629.0 | LG |
| 629.0 | LG |
| 629.0 | LG |
| 629.0 | LG |
| 629.0 | LG |
| 648.0 | Samsung |
| 669.0 | Apple |
| 684.0 | LG |
| 684.0 | LG |
| 685.0 | Apple |
| 703.0 | Apple |
| 703.0 | Apple |
| 703.0 | Apple |
| 703.0 | Apple |
| 703.0 | Apple |
| 703.0 | Apple |
| 703.0 | Apple |
| 703.0 | Apple |
| 703.0 | Apple |
| 703.0 | Apple |
| 703.0 | Apple |
| 703.0 | Apple |
| 703.0 | Apple |
| 704.0 | Apple |
| 704.0 | Apple |
| 704.0 | Apple |
| 704.0 | Apple |
| 704.0 | Apple |
| 704.0 | Apple |
| 704.0 | Apple |
| 704.0 | Apple |
| 715.0 | Samsung |
| 715.0 | Samsung |
| 715.0 | Samsung |
| 715.0 | Samsung |
| 716.0 | Samsung |
| 716.0 | Samsung |
| 717.0 | Apple |
| 717.0 | Apple |
| 717.0 | Apple |
| 717.0 | Apple |
| 717.0 | Apple |
| 717.0 | Apple |
| 717.0 | Apple |
| 717.0 | Apple |
| 717.0 | Apple |
| 717.0 | Apple |
| 717.0 | Apple |
| 717.0 | Apple |
| 717.0 | Apple |
| 717.0 | Apple |
| 717.0 | Apple |
| 717.0 | Apple |
| 717.0 | Apple |
| 721.0 | LG |
| 721.0 | LG |
| 721.0 | LG |
| 743.0 | LG |
| 743.0 | LG |
| 743.0 | LG |
| 755.0 | Samsung |
| 755.0 | Samsung |
| 761.0 | LG |
| 761.0 | LG |
| 761.0 | LG |
| 761.0 | LG |
| 761.0 | LG |
| 761.0 | LG |
| 761.0 | LG |
| 761.0 | LG |
| 761.0 | LG |
| 761.0 | LG |
| 761.0 | LG |
| 761.0 | LG |
| 761.0 | LG |
| 761.0 | LG |
| 761.0 | LG |
| 763.0 | Apple |
| 763.0 | Apple |
| 763.0 | Apple |
| 763.0 | Apple |
| 763.0 | Apple |
| 764.0 | Apple |
| 778.0 | Apple |
| 781.0 | Apple |
| 804.0 | LG |
| 804.0 | LG |
| 804.0 | LG |
| 804.0 | LG |
| 804.0 | LG |
| 804.0 | LG |
| 820.0 | HTC |
| 820.0 | HTC |
| 821.0 | Apple |
| 837.0 | Apple |
| 845.0 | Apple |
| 850.0 | Samsung |
| 850.0 | Samsung |
| 850.0 | Samsung |
| 850.0 | Samsung |
| 864.0 | RIM |
| 875.0 | RIM |
| 900.0 | Samsung |
| 900.0 | Samsung |
| 900.0 | Samsung |
| 900.0 | Samsung |
| 900.0 | Samsung |
| 900.0 | Samsung |
| 900.0 | Samsung |
| 900.0 | Samsung |
| 927.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 983.0 | Apple |
| 986.0 | LG |
| 986.0 | LG |
| 995.0 | Apple |
| 995.0 | Apple |
| 995.0 | Apple |
| 995.0 | Apple |
| 995.0 | Apple |
| 995.0 | Apple |
| 995.0 | Apple |
| 995.0 | Apple |
| 995.0 | Apple |
| 995.0 | Apple |
| 995.0 | Apple |
| 995.0 | Apple |
| 995.0 | Apple |
| 995.0 | Apple |
| 995.0 | Apple |
| 995.0 | Apple |
| 995.0 | Apple |
| 999.0 | Apple |
| 999.0 | Apple |
| 1004.0 | Apple |
| 1005.0 | LG |
| 1005.0 | LG |
| 1005.0 | LG |
| 1005.0 | LG |
| 1035.0 | Samsung |
| 1035.0 | Samsung |
| 1035.0 | Samsung |
| 1035.0 | Samsung |
| 1035.0 | Samsung |
| 1035.0 | Samsung |
| 1035.0 | Samsung |
| 1044.0 | LG |
| 1044.0 | LG |
| 1044.0 | LG |
| 1051.0 | Samsung |
| 1051.0 | Samsung |
| 1055.0 | HTC |
| 1067.0 | HTC |
| 1090.0 | Samsung |
| 1090.0 | Samsung |
| 1090.0 | Samsung |
| 1090.0 | Samsung |
| 1097.0 | Samsung |
| 1101.0 | Apple |
| 1122.0 | Apple |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1140.0 | LG |
| 1149.0 | LG |
| 1149.0 | LG |
| 1151.0 | Apple |
| 1152.0 | HTC |
| 1153.0 | RIM |
| 1153.0 | RIM |
| 1156.0 | LG |
| 1156.0 | LG |
| 1156.0 | LG |
| 1156.0 | LG |
| 1168.0 | Samsung |
| 1168.0 | Samsung |
| 1168.0 | Samsung |
| 1168.0 | Samsung |
| 1168.0 | Samsung |
| 1168.0 | Samsung |
| 1168.0 | Samsung |
| 1168.0 | Samsung |
| 1168.0 | Samsung |
| 1168.0 | Samsung |
| 1168.0 | Samsung |
| 1169.0 | Apple |
| 1181.0 | Samsung |
| 1181.0 | Samsung |
| 1181.0 | Samsung |
| 1200.0 | Apple |
| 1223.0 | LG |
| 1223.0 | LG |
| 1223.0 | LG |
| 1228.0 | HTC |
| 1228.0 | HTC |
| 1228.0 | HTC |
| 1228.0 | HTC |
| 1228.0 | HTC |
| 1228.0 | HTC |
| 1228.0 | HTC |
| 1228.0 | HTC |
| 1228.0 | HTC |
| 1228.0 | HTC |
| 1231.0 | Apple |
| 1231.0 | Apple |
| 1231.0 | Apple |
| 1231.0 | Apple |
| 1231.0 | Apple |
| 1231.0 | Apple |
| 1231.0 | Apple |
| 1231.0 | Apple |
| 1231.0 | Apple |
| 1231.0 | Apple |
| 1231.0 | Apple |
| 1231.0 | Apple |
| 1231.0 | Apple |
| 1231.0 | Apple |
| 1231.0 | Apple |
| 1231.0 | Apple |
| 1231.0 | Apple |
| 1231.0 | Apple |
| 1231.0 | Apple |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1244.0 | LG |
| 1245.0 | Motorola |
| 1245.0 | Motorola |
| 1245.0 | Motorola |
| 1271.0 | Samsung |
| 1271.0 | Samsung |
| 1271.0 | Samsung |
| 1271.0 | Samsung |
| 1271.0 | Samsung |
| 1318.0 | Samsung |
| 1318.0 | Samsung |
| 1328.0 | Apple |
| 1328.0 | Apple |
| 1328.0 | Apple |
| 1328.0 | Apple |
| 1328.0 | Apple |
| 1329.0 | LG |
| 1329.0 | LG |
| 1371.0 | Apple |
| 1371.0 | Apple |
| 1371.0 | Apple |
| 1371.0 | Apple |
| 1371.0 | Apple |
| 1371.0 | Apple |
| 1371.0 | Apple |
| 1371.0 | Apple |
| 1371.0 | Apple |
| 1371.0 | Apple |
| 1371.0 | Apple |
| 1371.0 | Apple |
| 1371.0 | Apple |
| 1371.0 | Apple |
| 1371.0 | Apple |
| 1372.0 | RIM |
| 1372.0 | RIM |
| 1372.0 | RIM |
| 1377.0 | Samsung |
| 1377.0 | Samsung |
| 1377.0 | Samsung |
| 1385.0 | LG |
| 1385.0 | LG |
| 1396.0 | Apple |
| 1398.0 | Apple |
| 1418.0 | Apple |
| 1419.0 | RIM |
| 1419.0 | RIM |
| 1437.0 | Apple |
| 1443.0 | LG |
| 1451.0 | LG |
| 1451.0 | LG |
| 1451.0 | LG |
| 1451.0 | LG |
| 1451.0 | LG |
| 1463.0 | LG |
| 1463.0 | LG |
| 1463.0 | LG |
| 1463.0 | LG |
| 1479.0 | Apple |
| 1479.0 | Apple |
| 1479.0 | Apple |
| 1479.0 | Apple |
| 1479.0 | Apple |
| 1479.0 | Apple |
| 1479.0 | Apple |
| 1479.0 | Apple |
| 1479.0 | Apple |
| 1479.0 | Apple |
| 1479.0 | Apple |
| 1479.0 | Apple |
| 1479.0 | Apple |
| 1493.0 | Apple |
| 1493.0 | Apple |
| 1493.0 | Apple |
| 1498.0 | RIM |
| 1507.0 | LG |
| 1507.0 | LG |
| 1507.0 | LG |
| 1507.0 | LG |
| 1507.0 | LG |
| 1507.0 | LG |
| 1507.0 | LG |
| 1507.0 | LG |
| 1507.0 | LG |
| 1507.0 | LG |
| 1507.0 | LG |
| 1507.0 | LG |
| 1507.0 | LG |
| 1507.0 | LG |
| 1507.0 | LG |
| 1507.0 | LG |
| 1525.0 | Samsung |
| 1525.0 | Samsung |
| 1525.0 | Samsung |
| 1529.0 | Apple |
| 1535.0 | Apple |
| 1535.0 | Apple |
| 1535.0 | Apple |
| 1535.0 | Apple |
| 1544.0 | Apple |
| 1550.0 | RIM |
| 1550.0 | RIM |
| 1550.0 | RIM |
| 1550.0 | RIM |
| 1573.0 | Samsung |
| 1573.0 | Samsung |
| 1578.0 | Samsung |
| 1579.0 | Apple |
| 1589.0 | Apple |
| 1601.0 | Samsung |
| 1601.0 | Samsung |
| 1601.0 | Samsung |
| 1601.0 | Samsung |
| 1601.0 | Samsung |
| 1601.0 | Samsung |
| 1601.0 | Samsung |
| 1601.0 | Samsung |
| 1601.0 | Samsung |
| 1601.0 | Samsung |
| 1601.0 | Samsung |
| 1601.0 | Samsung |
| 1617.0 | Apple |
| 1622.0 | Apple |
| 1632.0 | Apple |
| 1632.0 | Apple |
| 1632.0 | Apple |
| 1656.0 | Apple |
| 1656.0 | Apple |
| 1656.0 | Apple |
| 1661.0 | Samsung |
| 1661.0 | Samsung |
| 1668.0 | Apple |
| 1676.0 | LG |
| 1694.0 | Motorola |
| 1743.0 | Apple |
| 1763.0 | LG |
| 1775.0 | LG |
| 1775.0 | LG |
| 1775.0 | LG |
| 1775.0 | LG |
| 1775.0 | LG |
| 1800.0 | LG |
| 1800.0 | LG |
| 1800.0 | LG |
| 1800.0 | LG |
| 1800.0 | LG |
| 1800.0 | LG |
| 1800.0 | LG |
| 1800.0 | LG |
| 1857.0 | LG |
| 1857.0 | LG |
| 1861.0 | Apple |
| 1862.0 | RIM |
| 1862.0 | RIM |
| 1862.0 | RIM |
| 1862.0 | RIM |
| 1862.0 | RIM |
| 1862.0 | RIM |
| 1862.0 | RIM |
| 1864.0 | Apple |
| 1864.0 | Apple |
| 1864.0 | Apple |
| 1864.0 | Apple |
| 1901.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1905.0 | Samsung |
| 1918.0 | Apple |
| 1935.0 | Samsung |
| 1935.0 | Samsung |
| 1941.0 | HTC |
| 1950.0 | LG |
| 1950.0 | LG |
| 1950.0 | LG |
| 1950.0 | LG |
| 1969.0 | Apple |
| 1985.0 | Apple |
| 1993.0 | Samsung |
| 1993.0 | Samsung |
| 1996.0 | Samsung |
| 1996.0 | Samsung |
| 1996.0 | Samsung |
| 2008.0 | Apple |
| 2008.0 | Apple |
| 2008.0 | Apple |
| 2008.0 | Apple |
| 2008.0 | Apple |
| 2013.0 | Apple |
| 2013.0 | Apple |
| 2013.0 | Apple |
| 2040.0 | Motorola |
| 2041.0 | LG |
| 2047.0 | Apple |
| 2047.0 | Apple |
| 2047.0 | Apple |
| 2048.0 | Apple |
| 2048.0 | Apple |
| 2048.0 | Apple |
| 2053.0 | Apple |
| 2053.0 | Apple |
| 2053.0 | Apple |
| 2053.0 | Apple |
| 2053.0 | Apple |
| 2053.0 | Apple |
| 2053.0 | Apple |
| 2053.0 | Apple |
| 2053.0 | Apple |
| 2068.0 | LG |
| 2068.0 | LG |
| 2068.0 | LG |
| 2068.0 | LG |
| 2068.0 | LG |
| 2074.0 | Apple |
| 2074.0 | Apple |
| 2100.0 | Apple |
| 2105.0 | RIM |
| 2112.0 | HTC |
| 2112.0 | HTC |
| 2112.0 | HTC |
| 2112.0 | HTC |
| 2112.0 | HTC |
| 2112.0 | HTC |
| 2112.0 | HTC |
| 2119.0 | RIM |
| 2119.0 | RIM |
| 2133.0 | Apple |
| 2147.0 | Samsung |
| 2147.0 | Samsung |
| 2171.0 | Motorola |
| 2173.0 | Samsung |
| 2173.0 | Samsung |
| 2173.0 | Samsung |
| 2173.0 | Samsung |
| 2173.0 | Samsung |
| 2173.0 | Samsung |
| 2173.0 | Samsung |
| 2173.0 | Samsung |
| 2178.0 | RIM |
| 2185.0 | Samsung |
| 2185.0 | Samsung |
| 2185.0 | Samsung |
| 2186.0 | Apple |
| 2186.0 | Apple |
| 2186.0 | Apple |
| 2186.0 | Apple |
| 2186.0 | Apple |
| 2186.0 | Apple |
| 2186.0 | Apple |
| 2186.0 | Apple |
| 2186.0 | Apple |
| 2186.0 | Apple |
| 2186.0 | Apple |
| 2186.0 | Apple |
| 2186.0 | Apple |
| 2186.0 | Apple |
| 2186.0 | Apple |
| 2191.0 | Samsung |
| 2191.0 | Samsung |
| 2191.0 | Samsung |
| 2191.0 | Samsung |
Assignment
Ingest, Explore, Play with this kaggle dataset: - https://www.kaggle.com/marcodena/mobile-phone-activity/version/1
Here are some resources worth looking at:
- https://databricks.com/session/improving-traffic-prediction-using-weather-data
- https://www.ibm.com/developerworks/
- https://www.ibm.com/developerworks/community/blogs/jacquesroy/entry/talking_timeseries2?lang=en
- https://databricks.com/blog/2017/05/09/detecting-abuse-scale-locality-sensitive-hashing-uber-engineering.html
- https://dzone.com/articles/implementing-live-weather-reporting-with-hdfhorton
- https://github.com/twosigma/flint
- https://databricks.gitbooks.io/databricks-spark-reference-applications/content/timeseries/index.html
- https://blog.cloudera.com/blog/2015/12/spark-ts-a-new-library-for-analyzing-time-series-data-with-apache-spark/
- https://github.com/sryza/spark-timeseries
A lot has transpired in 2020 and new libraries are available for geospatial analytics at scale.
- https://databricks.com/blog/2019/12/05/processing-geospatial-data-at-scale-with-databricks.html
- https://databricks.com/session_na20/geospatial-options-in-apache-spark
- https://databricks.com/blog/2021/02/11/amplify-insights-into-your-industry-with-geospatial-analytics.html
- https://databricks.com/blog/2020/01/28/geospatial-analytics-public-sector-webcast.html
And other research in privacy-aware mobility research using co-trajectories, includes:
Notebooks structure and necessary libraries
Stavroula Rafailia Vlachou (LinkedIn), Virginia Jimenez Mohedano (LinkedIn) and Raazesh Sainudiin (LinkedIn).
This project was supported by SENSMETRY through a Data Science Project Internship
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou and Virginia J. Mohedano
and Databricks University Alliance with infrastructure credits from AWS to
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.
2022, Uppsala, Sweden
The common notebooks are:
1. 03301OSMtoGraphXUppsalaTiny: Construction of a road graph from OpenStreetMap (OSM) data with GraphX and finer partitions for a small area in Uppsala.
**2. 03302OSMtoGraphX_LT:** Construction of a road graph corresponding to Lithuania's road network from OSM data with GraphX. Ingestion of OSM data with methods from the osm-parquetizer project; suitable for big data. Further segmentation.
The project's open source code regarding Rafailia's part is structures as follows:
**1. 03401MapMatchingwithGeoMatch_UppsalaTiny:** GeoMatch: Map-matching OSM nodes to OSM ways (showcase)
2. 03402MapMatchingonaGraphUppsalaTiny: GeoMatch: Map-matching OSM nodes to a road graph G0. The latter is constructed by a discretization of the road network provided by OSM.
3. 03403MapMatchingonaGraphLT: GeoMatch: Map-matching events of interest (vehicle collisions) onto Lithuania's road graph G0. Revisit end of the notebook after 034_06SimulatingArrivalTimesNHPP_Inversion for the generation of location for each time variate simulated for the NHPP.
4. 03404MapMatchingonaG1LT: GeoMatch: Map-matching events of interest (vehicle collisions) onto Lithuania's coarsened road graph G1 (under a distance threshold of 100 meters).
**5. 034_05DistributionOfStates:** The conditional/posterior distributions of the states given a time unit and the distribution of the states independent of time.
6. 03406SimulatingArrivalTimesNHPPInversion: Simulation of the arrival times of a NHPP from one or more realisations.
The project's open source code regarding Virginia's part is structured as follows:
1. 03501Arcgiscoordinatestransformation: Transformation of coordinates using Arcgis Runtime library.
2. 03502SegmentationmunicipalitiesMagellan: Magellan: locating the accidents within each municipality.
**3. 03503Visualization_municipalities:** Visualizations of accidents in municipalities using Python.
**4. 03504MapMatching_intersections:** GeoMatch: map-matching accidents with their closest intersection and measuring the distance between them.
5. 03505UndirectedG0: Undirected graph from the topology road graph created using Open Street Maps (OSM) data.
**6. 03506ConnectedComponent_PageRank:** Connected component alogrithm is applied to undirected G0 together with pagerank algorithm.
7. 03507PoissonRegression: Poisson regression on the number of accidents based on different factors.
Maven libraries that need to be installed in the cluster
com.graphhopper:map-matching:0.6.0
io.spray:spray-json_2.11:1.3.4
org.openstreetmap.osmosis:osmosis-osm-binary:0.45
org.openstreetmap.osmosis:osmosis-pbf:0.45
org.openstreetmap.osmosis:osmosis-core:0.45
com.esri.geometry:esri-geometry-api:2.1.0
org.cusp.bdi.gm.GeoMatch
Creating a road graph from OpenStreetMap (OSM) data with GraphX
Stavroula Rafailia Vlachou (LinkedIn), Virginia Jimenez Mohedano (LinkedIn) and Raazesh Sainudiin (LinkedIn).
This project was supported by SENSMETRY through a Data Science Project Internship
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou and Virginia J. Mohedano
and Databricks University Alliance with infrastructure credits from AWS to
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.
2022, Uppsala, Sweden
This project builds on top of the work of Dillon George (2016-2018).
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
ls /datasets/osm/uppsala
| path | name | size |
|---|---|---|
| dbfs:/datasets/osm/uppsala/.uppsalaTinyR.pbf.node.parquet.crc | .uppsalaTinyR.pbf.node.parquet.crc | 172.0 |
| dbfs:/datasets/osm/uppsala/.uppsalaTinyR.pbf.relation.parquet.crc | .uppsalaTinyR.pbf.relation.parquet.crc | 84.0 |
| dbfs:/datasets/osm/uppsala/.uppsalaTinyR.pbf.way.parquet.crc | .uppsalaTinyR.pbf.way.parquet.crc | 84.0 |
| dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf | uppsalaTinyR.pbf | 17867.0 |
| dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf.node.parquet | uppsalaTinyR.pbf.node.parquet | 20829.0 |
| dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf.relation.parquet | uppsalaTinyR.pbf.relation.parquet | 9394.0 |
| dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf.way.parquet | uppsalaTinyR.pbf.way.parquet | 9542.0 |
| dbfs:/datasets/osm/uppsala/uppsalaTinyV.osm.pbf | uppsalaTinyV.osm.pbf | 30606.0 |
import crosby.binary.osmosis.OsmosisReader
import org.apache.hadoop.mapreduce.{TaskAttemptContext, JobContext}
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.openstreetmap.osmosis.core.container.v0_6.EntityContainer
import org.openstreetmap.osmosis.core.domain.v0_6._
import org.openstreetmap.osmosis.core.task.v0_6.Sink
import sqlContext.implicits._
import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.Map
import scala.collection.JavaConversions._
import org.apache.spark.graphx._
import crosby.binary.osmosis.OsmosisReader
import org.apache.hadoop.mapreduce.{TaskAttemptContext, JobContext}
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.openstreetmap.osmosis.core.container.v0_6.EntityContainer
import org.openstreetmap.osmosis.core.domain.v0_6._
import org.openstreetmap.osmosis.core.task.v0_6.Sink
import sqlContext.implicits._
import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.Map
import scala.collection.JavaConversions._
import org.apache.spark.graphx._
val allowableWays = Set(
"motorway",
"motorway_link",
"trunk",
"trunk_link",
"primary",
"primary_link",
"secondary",
"secondary_link",
"tertiary",
"tertiary_link",
"living_street",
"residential",
"road",
"construction",
"motorway_junction"
)
allowableWays: scala.collection.immutable.Set[String] = Set(construction, primary_link, secondary_link, secondary, residential, trunk_link, tertiary_link, motorway_link, motorway, tertiary, road, trunk, living_street, primary, motorway_junction)
val fs = FileSystem.get(new Configuration())
val path = new Path("dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf")
val file = fs.open(path)
var nodes: ArrayBuffer[Node] = ArrayBuffer()
var ways: ArrayBuffer[Way] = ArrayBuffer()
var relations: ArrayBuffer[Relation] = ArrayBuffer()
val osmosisReader = new OsmosisReader(file)
osmosisReader.setSink(new Sink {
override def process(entityContainer: EntityContainer): Unit = {
if (entityContainer.getEntity.getType != EntityType.Bound) {
val entity = entityContainer.getEntity
entity match {
case node: Node => nodes += node
case way: Way => {
val tagSet = way.getTags.map(_.getValue).toSet
if ( !(tagSet & allowableWays).isEmpty ) {
// way has at least one tag of interest
ways += way
}
}
case relation: Relation => relations += relation
}
}
}
override def initialize(map: java.util.Map[String, AnyRef]): Unit = {
nodes = ArrayBuffer()
ways = ArrayBuffer()
relations = ArrayBuffer()
}
override def complete(): Unit = {}
override def release(): Unit = {} // this is 4.6 method
def close(): Unit = {}
})
osmosisReader.run()
case class WayEntry(wayId: Long, tags: Array[String], nodes: Array[Long])
case class NodeEntry(nodeId: Long, latitude: Double, longitude: Double, tags: Array[String])
defined class WayEntry
defined class NodeEntry
//convert the nodes array to Dataset
val nodeDS = nodes.map{node =>
NodeEntry(node.getId,
node.getLatitude,
node.getLongitude,
node.getTags.map(_.getValue).toArray
)}.toDS
nodeDS: org.apache.spark.sql.Dataset[NodeEntry] = [nodeId: bigint, latitude: double ... 2 more fields]
nodeDS.count()
res2: Long = 627
nodeDS.show(5, false)
+--------+------------------+------------------+----+
|nodeId |latitude |longitude |tags|
+--------+------------------+------------------+----+
|312339 |59.856328500000004|17.6430124 |[] |
|312352 |59.85636590000001 |17.6478229 |[] |
|312353 |59.857437700000006|17.645897700000003|[] |
|312363 |59.857601900000006|17.6432529 |[] |
|25724030|59.857001200000006|17.6418004 |[] |
+--------+------------------+------------------+----+
only showing top 5 rows
//convert the ways array to Dataset
val wayDS = ways.map(way =>
WayEntry(way.getId,
way.getTags.map(_.getValue).toArray,
way.getWayNodes.map(_.getNodeId).toArray)
).toDS.cache
wayDS: org.apache.spark.sql.Dataset[WayEntry] = [wayId: bigint, tags: array<string> ... 1 more field]
wayDS.count()
res5: Long = 9
wayDS.show(9, false)
+---------+--------------------------------------------------------------------------+----------------------------------------------+
|wayId |tags |nodes |
+---------+--------------------------------------------------------------------------+----------------------------------------------+
|4281074 |[living_street, Bredgränd, paving_stones] |[25812013] |
|73834008 |[4, secondary, 4, 40, Kungsgatan, asphalt] |[25734373, 312352, 3431600977] |
|263934971|[living_street, 7, Dragarbrunnsgatan, paving_stones, sv:Dragarbrunnsgatan]|[3067700668, 312363] |
|263934973|[living_street, 7, Dragarbrunnsgatan, paving_stones, sv:Dragarbrunnsgatan]|[312363, 3067700665, 25735257, 3067700641] |
|299906437|[4, secondary, 3, 2, 1, 40, Kungsgatan, asphalt] |[312353, 801437007, 2187779764, 25734373] |
|302521477|[residential, Dragarbrunnsgatan, asphalt, sv:Dragarbrunnsgatan] |[3067700641, 2206536285, 25734470, 2206536278]|
|302521479|[4, secondary, 3, 2, 1, 40, Kungsgatan, asphalt] |[455006648] |
|393182257|[living_street, yes, Vretgränd, no, asphalt] |[3963994985, 25735257] |
|733389337|[4, secondary, 3, 2, 1, 40, Kungsgatan, asphalt] |[455006648, 1523899738, 312353] |
+---------+--------------------------------------------------------------------------+----------------------------------------------+
import org.apache.spark.sql.functions.explode
val nodeCounts = wayDS
.select(explode('nodes).as("node"))
.groupBy('node).count
nodeCounts.show(5)
+----------+-----+
| node|count|
+----------+-----+
| 312363| 2|
| 455006648| 2|
| 25812013| 1|
|3067700668| 1|
| 25735257| 2|
+----------+-----+
only showing top 5 rows
import org.apache.spark.sql.functions.explode
nodeCounts: org.apache.spark.sql.DataFrame = [node: bigint, count: bigint]
val intersectionNodes = nodeCounts.filter('count >= 2).select('node.alias("intersectionNode"))
intersectionNodes: org.apache.spark.sql.DataFrame = [intersectionNode: bigint]
intersectionNodes.count() //there are 6 intersections in this area
res10: Long = 6
val true_intersections = intersectionNodes
true_intersections: org.apache.spark.sql.DataFrame = [intersectionNode: bigint]
true_intersections.count
res12: Long = 6
intersectionNodes.show()
+----------------+
|intersectionNode|
+----------------+
| 312363|
| 455006648|
| 25735257|
| 25734373|
| 312353|
| 3067700641|
+----------------+
val distinctNodesWays = wayDS.flatMap(_.nodes).distinct //the distinct nodes within the ways
distinctNodesWays: org.apache.spark.sql.Dataset[Long] = [value: bigint]
distinctNodesWays.printSchema
root
|-- value: long (nullable = false)
distinctNodesWays.count()
res16: Long = 18
distinctNodesWays.show(5)
+----------+
| value|
+----------+
| 312363|
| 455006648|
| 25812013|
|3067700668|
| 25735257|
+----------+
only showing top 5 rows
val wayNodes = nodeDS.as("nodes") //nodes that are in a way + nodes info from nodeDS
.joinWith(distinctNodesWays.as("ways"), $"ways.value" === $"nodes.nodeId")
.map(_._1).cache
wayNodes: org.apache.spark.sql.Dataset[NodeEntry] = [nodeId: bigint, latitude: double ... 2 more fields]
wayNodes.printSchema
root
|-- nodeId: long (nullable = false)
|-- latitude: double (nullable = false)
|-- longitude: double (nullable = false)
|-- tags: array (nullable = true)
| |-- element: string (containsNull = true)
wayNodes.count()
res20: Long = 18
wayNodes.show(5, false) //the nodes and their coordinates that participate in the ways 25734373, 312352
+--------+------------------+------------------+----+
|nodeId |latitude |longitude |tags|
+--------+------------------+------------------+----+
|312352 |59.85636590000001 |17.6478229 |[] |
|312353 |59.857437700000006|17.645897700000003|[] |
|312363 |59.857601900000006|17.6432529 |[] |
|25734373|59.8567674 |17.6471041 |[] |
|25734470|59.8562881 |17.6456634 |[] |
+--------+------------------+------------------+----+
only showing top 5 rows
wayDS.printSchema
root
|-- wayId: long (nullable = false)
|-- tags: array (nullable = true)
| |-- element: string (containsNull = true)
|-- nodes: array (nullable = true)
| |-- element: long (containsNull = false)
val intersectionSetVal = intersectionNodes.as[Long].collect.toSet; //turn intersectionNodes to Set
intersectionSetVal: scala.collection.immutable.Set[Long] = Set(3067700641, 312363, 455006648, 312353, 25735257, 25734373)
//new
import org.apache.spark.sql.functions.{collect_list, map, udf}
import org.apache.spark.sql.functions._
// You could try using `getItem` methods
// I assume that each "nodes" sequence contains at least one node
// We do not really need first and last elements from the sequence and when combining with original nodes, just we assign them "true"
val remove_first_and_last = udf((x: Seq[Long]) => x.drop(1).dropRight(1))
val nodes = wayDS.
select($"wayId", $"nodes").
withColumn("node", explode($"nodes")).
drop("nodes")
val get_first_and_last = udf((x: Seq[Long]) => {val first = x(0); val last = x.reverse(0); Array(first, last)})
val first_and_last_nodes = wayDS.
select($"wayId", get_first_and_last($"nodes").as("nodes")).
withColumn("node", explode($"nodes")).
drop("nodes")
val fake_intersections = first_and_last_nodes.select($"node").distinct().withColumnRenamed("node", "value")
// // Turn intersection set into a dataset to join (all values must be unique)
// //val intersections = intersectionSetVal.toSeq.toDF("value")
val intersections = intersectionNodes.union(fake_intersections).distinct //virginia
val wayNodesLocated = nodes.join(wayNodes, wayNodes.col("nodeId") === nodes.col("node")).select($"wayId", $"node", $"latitude", $"longitude")
// case class MappedWay(wayId: Long, labels: Seq[Map[Long, Boolean]])
case class MappedWay(wayId: Long, labels_located: Seq[Map[Long, (Boolean, Double, Double)]])
val maps = wayNodesLocated.join(intersections, 'node === 'intersectionNode, "left_outer").
//left outer joins returns all rows from the left DataFrame/Dataset regardless of match found on the right dataset
select($"wayId", $"node", $"intersectionNode".isNotNull.as("contains"), $"latitude", $"longitude").
groupBy("wayId").agg(collect_list(map($"node", struct($"contains".as("_1"), $"latitude".as("_2"), $"longitude".as("_3")))).as("labels_located")).as[MappedWay]
val combine = udf((nodes: Seq[Long], labels_located: Seq[scala.collection.immutable.Map[Long, (Boolean, Double, Double)]]) => {
// If labels does not have "node", then it is either start/end - we assign label = true, latitude = 0, longitude = 0 for it, TO DO: revise it later, not sure
val m = labels_located.map(_.toSeq).flatten.toMap
nodes.map { node => (node, m.getOrElse(node, (true, 0D, 0D))) } //add structure
})
val strSchema = "array<struct<nodeId:long, nodeInfo:struct<label:boolean, latitude:double, longitude: double>>>"
val labeledWays = wayDS.join(maps, "wayId")
.select($"wayId", $"tags", combine($"nodes", $"labels_located").as("labeledNodes").cast(strSchema))
import org.apache.spark.sql.functions.{collect_list, map, udf}
import org.apache.spark.sql.functions._
remove_first_and_last: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,ArrayType(LongType,false),Some(List(ArrayType(LongType,false))))
nodes: org.apache.spark.sql.DataFrame = [wayId: bigint, node: bigint]
get_first_and_last: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,ArrayType(LongType,false),Some(List(ArrayType(LongType,false))))
first_and_last_nodes: org.apache.spark.sql.DataFrame = [wayId: bigint, node: bigint]
fake_intersections: org.apache.spark.sql.DataFrame = [value: bigint]
intersections: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [intersectionNode: bigint]
wayNodesLocated: org.apache.spark.sql.DataFrame = [wayId: bigint, node: bigint ... 2 more fields]
defined class MappedWay
maps: org.apache.spark.sql.Dataset[MappedWay] = [wayId: bigint, labels_located: array<map<bigint,struct<_1:boolean,_2:double,_3:double>>>]
combine: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,ArrayType(StructType(StructField(_1,LongType,false), StructField(_2,StructType(StructField(_1,BooleanType,false), StructField(_2,DoubleType,false), StructField(_3,DoubleType,false)),true)),true),Some(List(ArrayType(LongType,false), ArrayType(MapType(LongType,StructType(StructField(_1,BooleanType,false), StructField(_2,DoubleType,false), StructField(_3,DoubleType,false)),true),true))))
strSchema: String = array<struct<nodeId:long, nodeInfo:struct<label:boolean, latitude:double, longitude: double>>>
labeledWays: org.apache.spark.sql.DataFrame = [wayId: bigint, tags: array<string> ... 1 more field]
labeledWays.printSchema
root
|-- wayId: long (nullable = false)
|-- tags: array (nullable = true)
| |-- element: string (containsNull = true)
|-- labeledNodes: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- nodeId: long (nullable = true)
| | |-- nodeInfo: struct (nullable = true)
| | | |-- label: boolean (nullable = true)
| | | |-- latitude: double (nullable = true)
| | | |-- longitude: double (nullable = true)
labeledWays.select("wayId", "labeledNodes").show(9, false)
+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|wayId |labeledNodes |
+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|393182257|[[3963994985, [true, 59.857381800000006, 17.645299100000003]], [25735257, [true, 59.8569759, 17.644382]]] |
|733389337|[[455006648, [true, 59.857930700000004, 17.6450031]], [1523899738, [false, 59.8575528, 17.645685500000003]], [312353, [true, 59.857437700000006, 17.645897700000003]]] |
|299906437|[[312353, [true, 59.857437700000006, 17.645897700000003]], [801437007, [false, 59.8571596, 17.6463952]], [2187779764, [false, 59.856883200000006, 17.6468947]], [25734373, [true, 59.8567674, 17.6471041]]] |
|263934973|[[312363, [true, 59.857601900000006, 17.6432529]], [3067700665, [false, 59.8575443, 17.6433633]], [25735257, [true, 59.8569759, 17.644382]], [3067700641, [true, 59.856720800000005, 17.6448606]]] |
|73834008 |[[25734373, [true, 59.8567674, 17.6471041]], [312352, [false, 59.85636590000001, 17.6478229]], [3431600977, [true, 59.85631480000001, 17.6479153]]] |
|302521479|[[455006648, [true, 59.857930700000004, 17.6450031]]] |
|302521477|[[3067700641, [true, 59.856720800000005, 17.6448606]], [2206536285, [false, 59.8563708, 17.645517400000003]], [25734470, [false, 59.8562881, 17.6456634]], [2206536278, [true, 59.85618040000001, 17.6458707]]]|
|263934971|[[3067700668, [true, 59.857640200000006, 17.6431843]], [312363, [true, 59.857601900000006, 17.6432529]]] |
|4281074 |[[25812013, [true, 59.8578769, 17.641676]]] |
+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
case class Intersection(OSMId: Long , latitude: Double, longitude: Double, inBuf: ArrayBuffer[(Long, Double, Double)], outBuf: ArrayBuffer[(Long, Double, Double)])
defined class Intersection
val segmentedWays = labeledWays.map(way => {
val labeledNodes = way.getAs[Seq[Row]]("labeledNodes").map{case Row(k: Long, Row(v: Boolean, w:Double, x:Double)) => (k, v,w,x)}.toSeq //labeledNodes: (nodeid, label, lat, long)
val wayId = way.getAs[Long]("wayId")
val indexedNodes: Seq[((Long, Boolean, Double, Double), Int)] = labeledNodes.zipWithIndex //appends an integer as an index to every labeledNodes in a way
val intersections = ArrayBuffer[Intersection]()
val currentBuffer = ArrayBuffer[(Long, Double, Double)]()
val way_length = labeledNodes.length //number of nodes in a way
if (way_length == 1) {
val intersect = new Intersection(labeledNodes(0)._1, labeledNodes(0)._3, labeledNodes(0)._4, ArrayBuffer((-1L, 0D, 0D)), ArrayBuffer((-1L, 0D, 0D))) //include lat and long info
var result = Array((intersect.OSMId, intersect.latitude, intersect.longitude, intersect.inBuf.toArray, intersect.outBuf.toArray))
(wayId, result) //return
}
else {
indexedNodes.foreach{ case ((id, isIntersection, latitude, longitude), i) => // id is nodeId and isIntersection is the node label
if (isIntersection) {
val newEntry = new Intersection(id, latitude, longitude, currentBuffer.clone, ArrayBuffer[(Long, Double, Double)]())
intersections += newEntry
currentBuffer.clear
}
else {
currentBuffer ++= Array((id, latitude, longitude)) //if the node is not an intersection append the nodeId to the current buffer
}
// Reaches the end of the way while the outBuffer is not empty
// Append the currentBuffer to the last intersection
if (i == way_length - 1 && !currentBuffer.isEmpty) {
if (intersections.isEmpty){
//intersections += new Intersection(-1L, 0D, 0D, ArrayBuffer[(Long, Double, Double)](), currentBuffer) //not sure about this but I'll keep it by now
intersections += new Intersection(-1, 0D, 0D, currentBuffer, ArrayBuffer[(Long, Double, Double)]())
}
else {
intersections.last.outBuf ++= currentBuffer
}
currentBuffer.clear
}
}
var result = intersections.map(i => (i.OSMId, i.latitude, i.longitude, i.inBuf.toArray, i.outBuf.toArray)).toArray
(wayId, result)
}
})
//segmentedWays contains two columns:
//_1: wayId
//_2: Array[(nodeId, latitude, longitude, inBuff, outBuff)] for each intersection node in the way
segmentedWays: org.apache.spark.sql.Dataset[(Long, Array[(Long, Double, Double, Array[(Long, Double, Double)], Array[(Long, Double, Double)])])] = [_1: bigint, _2: array<struct<_1:bigint,_2:double,_3:double,_4:array<struct<_1:bigint,_2:double,_3:double>>,_5:array<struct<_1:bigint,_2:double,_3:double>>>>]
val schema = "array<struct<nodeId:bigint,latitude:double,longitude:double,inBuff:array<struct<nodeId:bigint,latitude:double,longitude:double>>,outBuff:array<struct<nodeId:bigint,latitude:double,longitude:double>>>>"
segmentedWays.select($"_1".alias("wayId"), $"_2".cast(schema).alias("nodeInfo")).printSchema()
root
|-- wayId: long (nullable = false)
|-- nodeInfo: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- nodeId: long (nullable = true)
| | |-- latitude: double (nullable = true)
| | |-- longitude: double (nullable = true)
| | |-- inBuff: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- nodeId: long (nullable = true)
| | | | |-- latitude: double (nullable = true)
| | | | |-- longitude: double (nullable = true)
| | |-- outBuff: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- nodeId: long (nullable = true)
| | | | |-- latitude: double (nullable = true)
| | | | |-- longitude: double (nullable = true)
schema: String = array<struct<nodeId:bigint,latitude:double,longitude:double,inBuff:array<struct<nodeId:bigint,latitude:double,longitude:double>>,outBuff:array<struct<nodeId:bigint,latitude:double,longitude:double>>>>
segmentedWays.show(2, false)
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
|_1 |_2 |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
|393182257|[[3963994985, 59.857381800000006, 17.645299100000003, [], []], [25735257, 59.8569759, 17.644382, [], []]] |
|733389337|[[455006648, 59.857930700000004, 17.6450031, [], []], [312353, 59.857437700000006, 17.645897700000003, [[1523899738, 59.8575528, 17.645685500000003]], []]]|
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
only showing top 2 rows
//The nested structure of the segmentedWays is unwrapped
val waySegmentDS = segmentedWays
.flatMap(way => way._2.map(node => (way._1, node)))
// for each (wayId, Array(IntersectionNode) => (wayId, IntersectionNode)
waySegmentDS: org.apache.spark.sql.Dataset[(Long, (Long, Double, Double, Array[(Long, Double, Double)], Array[(Long, Double, Double)]))] = [_1: bigint, _2: struct<_1: bigint, _2: double ... 3 more fields>]
waySegmentDS.printSchema
root
|-- _1: long (nullable = false)
|-- _2: struct (nullable = true)
| |-- _1: long (nullable = false)
| |-- _2: double (nullable = false)
| |-- _3: double (nullable = false)
| |-- _4: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- _1: long (nullable = false)
| | | |-- _2: double (nullable = false)
| | | |-- _3: double (nullable = false)
| |-- _5: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- _1: long (nullable = false)
| | | |-- _2: double (nullable = false)
| | | |-- _3: double (nullable = false)
waySegmentDS.show(5, false)
+---------+----------------------------------------------------------------------------------------------------+
|_1 |_2 |
+---------+----------------------------------------------------------------------------------------------------+
|393182257|[3963994985, 59.857381800000006, 17.645299100000003, [], []] |
|393182257|[25735257, 59.8569759, 17.644382, [], []] |
|733389337|[455006648, 59.857930700000004, 17.6450031, [], []] |
|733389337|[312353, 59.857437700000006, 17.645897700000003, [[1523899738, 59.8575528, 17.645685500000003]], []]|
|299906437|[312353, 59.857437700000006, 17.645897700000003, [], []] |
+---------+----------------------------------------------------------------------------------------------------+
only showing top 5 rows
import scala.collection.immutable.Map
import scala.collection.immutable.Map
//returns the intersection nodes with the ways where they appear mapped with the nodes in those ways (inBuff, outBuff)
val intersectionVertices = waySegmentDS
.map(way =>
//nodeId latitude longitude wayId inBuff outBuff
(way._2._1, (way._2._2, way._2._3, Map(way._1 -> (way._2._4, way._2._5)))))
.rdd
// latitude, long, Map(wayId, inBuff, outBuff)
.reduceByKey((a,b) => (a._1, a._2, a._3 ++ b._3))
//intersectionVertices = RDD[(nodeId, (latitude, longitude, wayMap(wayId -> inBuff, outBuff)))]
intersectionVertices: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = ShuffledRDD[259] at reduceByKey at command-588572986432353:8
intersectionVertices.map(vertex => (vertex._1, vertex._2._1, vertex._2._2)).toDF("vertexId", "latitude", "longitude").write.mode("overwrite").parquet("dbfs:/graphs/uppsala/vertices")
intersectionVertices.count()
res32: Long = 11
intersectionVertices.take(10)
res33: Array[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = Array((25812013,(59.8578769,17.641676,Map(4281074 -> (Array((-1,0.0,0.0)),Array((-1,0.0,0.0)))))), (455006648,(59.857930700000004,17.6450031,Map(733389337 -> (Array(),Array()), 302521479 -> (Array((-1,0.0,0.0)),Array((-1,0.0,0.0)))))), (25735257,(59.8569759,17.644382,Map(393182257 -> (Array(),Array()), 263934973 -> (Array((3067700665,59.8575443,17.6433633)),Array())))), (3431600977,(59.85631480000001,17.6479153,Map(73834008 -> (Array((312352,59.85636590000001,17.6478229)),Array())))), (3963994985,(59.857381800000006,17.645299100000003,Map(393182257 -> (Array(),Array())))), (3067700641,(59.856720800000005,17.6448606,Map(263934973 -> (Array(),Array()), 302521477 -> (Array(),Array())))), (312353,(59.857437700000006,17.645897700000003,Map(733389337 -> (Array((1523899738,59.8575528,17.645685500000003)),Array()), 299906437 -> (Array(),Array())))), (312363,(59.857601900000006,17.6432529,Map(263934973 -> (Array(),Array()), 263934971 -> (Array(),Array())))), (3067700668,(59.857640200000006,17.6431843,Map(263934971 -> (Array(),Array())))), (25734373,(59.8567674,17.6471041,Map(299906437 -> (Array((801437007,59.8571596,17.6463952), (2187779764,59.856883200000006,17.6468947)),Array()), 73834008 -> (Array(),Array())))))
val edges = segmentedWays
.filter(way => way._2.length > 1) //ways with more than one intersections
.flatMap{ case (wayId, nodes_info) => {
nodes_info.sliding(2) // For each way it takes nodes in pairs
.flatMap(segment => //segment is the pair of two nodes
List(Edge(segment(0)._1, segment(1)._1, wayId))
)
}}
edges: org.apache.spark.sql.Dataset[org.apache.spark.graphx.Edge[Long]] = [srcId: bigint, dstId: bigint ... 1 more field]
edges.map(edge => (edge.srcId, edge.dstId)).toDF("src","dst").write.mode("overwrite").parquet("dbfs:/graphs/uppsala/edges")
edges.printSchema
root
|-- srcId: long (nullable = false)
|-- dstId: long (nullable = false)
|-- attr: long (nullable = false)
edges.count
res35: Long = 8
val roadGraph = Graph(intersectionVertices, edges.rdd).cache
//intersectionVertices = RDD[(nodeId, (latitude, longitude, wayMap(wayId -> inBuff, outBuff)))]
//edges = srcId, dstId, attribute (attribute is the wayId)
roadGraph: org.apache.spark.graphx.Graph[(Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]),Long] = org.apache.spark.graphx.impl.GraphImpl@4114626
roadGraph.edges.take(10).foreach(println)
Edge(3963994985,25735257,393182257)
Edge(455006648,312353,733389337)
Edge(312353,25734373,299906437)
Edge(312363,25735257,263934973)
Edge(25735257,3067700641,263934973)
Edge(25734373,3431600977,73834008)
Edge(3067700641,2206536278,302521477)
Edge(3067700668,312363,263934971)
package d3
// We use a package object so that we can define top level classes like Edge that need to be used in other cells
// This was modified by Ivan Sadikov to make sure it is compatible the latest databricks notebook
import org.apache.spark.sql._
import com.databricks.backend.daemon.driver.EnhancedRDDFunctions.displayHTML
case class Edge(src: String, dest: String, count: Long)
case class Node(name: String)
case class Link(source: Int, target: Int, value: Long)
case class Graph(nodes: Seq[Node], links: Seq[Link])
object graphs {
// val sqlContext = SQLContext.getOrCreate(org.apache.spark.SparkContext.getOrCreate()) /// fix
val sqlContext = SparkSession.builder().getOrCreate().sqlContext
import sqlContext.implicits._
def force(clicks: Dataset[Edge], height: Int = 100, width: Int = 960): Unit = {
val data = clicks.collect()
val nodes = (data.map(_.src) ++ data.map(_.dest)).map(_.replaceAll("_", " ")).toSet.toSeq.map(Node)
val links = data.map { t =>
Link(nodes.indexWhere(_.name == t.src.replaceAll("_", " ")), nodes.indexWhere(_.name == t.dest.replaceAll("_", " ")), t.count / 20 + 1)
}
showGraph(height, width, Seq(Graph(nodes, links)).toDF().toJSON.first())
}
/**
* Displays a force directed graph using d3
* input: {"nodes": [{"name": "..."}], "links": [{"source": 1, "target": 2, "value": 0}]}
*/
def showGraph(height: Int, width: Int, graph: String): Unit = {
displayHTML(s"""
<style>
.node_circle {
stroke: #777;
stroke-width: 1.3px;
}
.node_label {
pointer-events: none;
}
.link {
stroke: #777;
stroke-opacity: .2;
}
.node_count {
stroke: #777;
stroke-width: 1.0px;
fill: #999;
}
text.legend {
font-family: Verdana;
font-size: 13px;
fill: #000;
}
.node text {
font-family: "Helvetica Neue","Helvetica","Arial",sans-serif;
font-size: 17px;
font-weight: 200;
}
</style>
<div id="clicks-graph">
<script src="//d3js.org/d3.v3.min.js"></script>
<script>
var graph = $graph;
var width = $width,
height = $height;
var color = d3.scale.category20();
var force = d3.layout.force()
.charge(-700)
.linkDistance(180)
.size([width, height]);
var svg = d3.select("#clicks-graph").append("svg")
.attr("width", width)
.attr("height", height);
force
.nodes(graph.nodes)
.links(graph.links)
.start();
var link = svg.selectAll(".link")
.data(graph.links)
.enter().append("line")
.attr("class", "link")
.style("stroke-width", function(d) { return Math.sqrt(d.value); });
var node = svg.selectAll(".node")
.data(graph.nodes)
.enter().append("g")
.attr("class", "node")
.call(force.drag);
node.append("circle")
.attr("r", 10)
.style("fill", function (d) {
if (d.name.startsWith("other")) { return color(1); } else { return color(2); };
})
node.append("text")
.attr("dx", 10)
.attr("dy", ".35em")
.text(function(d) { return d.name });
//Now we are giving the SVGs co-ordinates - the force layout is generating the co-ordinates which this code is using to update the attributes of the SVG elements
force.on("tick", function () {
link.attr("x1", function (d) {
return d.source.x;
})
.attr("y1", function (d) {
return d.source.y;
})
.attr("x2", function (d) {
return d.target.x;
})
.attr("y2", function (d) {
return d.target.y;
});
d3.selectAll("circle").attr("cx", function (d) {
return d.x;
})
.attr("cy", function (d) {
return d.y;
});
d3.selectAll("text").attr("x", function (d) {
return d.x;
})
.attr("y", function (d) {
return d.y;
});
});
</script>
</div>
""")
}
def help() = {
displayHTML("""
<p>
Produces a force-directed graph given a collection of edges of the following form:</br>
<tt><font color="#a71d5d">case class</font> <font color="#795da3">Edge</font>(<font color="#ed6a43">src</font>: <font color="#a71d5d">String</font>, <font color="#ed6a43">dest</font>: <font color="#a71d5d">String</font>, <font color="#ed6a43">count</font>: <font color="#a71d5d">Long</font>)</tt>
</p>
<p>Usage:<br/>
<tt><font color="#a71d5d">import</font> <font color="#ed6a43">d3._</font></tt><br/>
<tt><font color="#795da3">graphs.force</font>(</br>
<font color="#ed6a43">height</font> = <font color="#795da3">500</font>,<br/>
<font color="#ed6a43">width</font> = <font color="#795da3">500</font>,<br/>
<font color="#ed6a43">clicks</font>: <font color="#795da3">Dataset</font>[<font color="#795da3">Edge</font>])</tt>
</p>""")
}
}
Warning: classes defined within packages cannot be redefined without a cluster restart.
Compilation successful.
import d3._
import org.apache.spark.sql.functions.lit
val G0 = roadGraph.edges.toDF().select($"srcId".as("src"), $"dstId".as("dest"), lit(1L).as("count"))
d3.graphs.force(
height = 800,
width = 800,
clicks = G0.as[d3.Edge])
import com.esri.core.geometry.GeometryEngine.geodesicDistanceOnWGS84
import com.esri.core.geometry.Point
import com.esri.core.geometry.GeometryEngine.geodesicDistanceOnWGS84
import com.esri.core.geometry.Point
val weightedRoadGraph = roadGraph.mapTriplets{triplet => //mapTriplets gives EdgeTriplet https://spark.apache.org/docs/2.3.1/api/java/org/apache/spark/graphx/EdgeTriplet.html
def dist(lat1: Double, long1: Double, lat2: Double, long2: Double): Double = {
val p1 = new Point(long1, lat1)
val p2 = new Point(long2, lat2)
geodesicDistanceOnWGS84(p1, p2)
}
//A triplet represents an edge along with the vertex attributes of its neighboring vertices (srcAttr, dstAttr)
//triplet.attr is the same as edge.attr
val wayNodesInBuff = triplet.dstAttr._3(triplet.attr)._1 //dstAttr is the vertex attribute (latitude, longitude, wayMap(wayId -> inBuff, outBuff))
// inBuff -> array(nodeId, lat, long)
if (wayNodesInBuff.isEmpty) {
(triplet.attr, dist(triplet.srcAttr._1, triplet.srcAttr._2, triplet.dstAttr._1, triplet.dstAttr._2))
} else {
var distance: Double = 0.0
//adds the distance between the src node and the first node in the InBuff
distance += dist(triplet.srcAttr._1, triplet.srcAttr._2, wayNodesInBuff(0)._2, wayNodesInBuff(0)._3 )
//more than one node in the inBuffer
if (wayNodesInBuff.length > 1) {
//adds the distance between every pair of nodes inside the inBuffer
distance += wayNodesInBuff.sliding(2).map{
buff => dist(buff(0)._2, buff(0)._3, buff(1)._2, buff(1)._3)}
.reduce(_ + _)
}
//adds the distance between the dst node and the last node in the InBuff
distance += dist(wayNodesInBuff.last._2, wayNodesInBuff.last._3, triplet.dstAttr._1, triplet.dstAttr._2)
(triplet.attr, distance)
}
}.cache
weightedRoadGraph: org.apache.spark.graphx.Graph[(Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]),(Long, Double)] = org.apache.spark.graphx.impl.GraphImpl@9eb578f
weightedRoadGraph.edges.count()
res36: Long = 8
weightedRoadGraph.edges.take(8).foreach(println)
Edge(3963994985,25735257,(393182257,68.4570414333903))
Edge(455006648,312353,(733389337,74.36517408391025))
Edge(312353,25734373,(299906437,100.7353398484194))
Edge(312363,25735257,(263934973,94.17321564547117))
Edge(25735257,3067700641,(263934973,39.0782384063323))
Edge(25734373,3431600977,(73834008,67.891710670905))
Edge(3067700641,2206536278,(302521477,82.6456450149808))
Edge(3067700668,312363,(263934971,5.743347106374985))
weightedRoadGraph.vertices.count()
res38: Long = 11
weightedRoadGraph.vertices.map(node => node._1).take(11)
res39: Array[org.apache.spark.graphx.VertexId] = Array(25812013, 455006648, 25735257, 3431600977, 3963994985, 3067700641, 312353, 312363, 3067700668, 25734373, 2206536278)
weightedRoadGraph.vertices.take(11)
res40: Array[(org.apache.spark.graphx.VertexId, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = Array((25812013,(59.8578769,17.641676,Map(4281074 -> (Array((-1,0.0,0.0)),Array((-1,0.0,0.0)))))), (455006648,(59.857930700000004,17.6450031,Map(733389337 -> (Array(),Array()), 302521479 -> (Array((-1,0.0,0.0)),Array((-1,0.0,0.0)))))), (25735257,(59.8569759,17.644382,Map(393182257 -> (Array(),Array()), 263934973 -> (Array((3067700665,59.8575443,17.6433633)),Array())))), (3431600977,(59.85631480000001,17.6479153,Map(73834008 -> (Array((312352,59.85636590000001,17.6478229)),Array())))), (3963994985,(59.857381800000006,17.645299100000003,Map(393182257 -> (Array(),Array())))), (3067700641,(59.856720800000005,17.6448606,Map(263934973 -> (Array(),Array()), 302521477 -> (Array(),Array())))), (312353,(59.857437700000006,17.645897700000003,Map(733389337 -> (Array((1523899738,59.8575528,17.645685500000003)),Array()), 299906437 -> (Array(),Array())))), (312363,(59.857601900000006,17.6432529,Map(263934973 -> (Array(),Array()), 263934971 -> (Array(),Array())))), (3067700668,(59.857640200000006,17.6431843,Map(263934971 -> (Array(),Array())))), (25734373,(59.8567674,17.6471041,Map(299906437 -> (Array((801437007,59.8571596,17.6463952), (2187779764,59.856883200000006,17.6468947)),Array()), 73834008 -> (Array(),Array())))), (2206536278,(59.85618040000001,17.6458707,Map(302521477 -> (Array((2206536285,59.8563708,17.645517400000003), (25734470,59.8562881,17.6456634)),Array())))))
import org.apache.spark.graphx.{Edge => Edges}
val splittedEdges = weightedRoadGraph.triplets.flatMap{triplet => {
def dist(lat1: Double, long1: Double, lat2: Double, long2: Double): Double = {
val p1 = new Point(long1, lat1)
val p2 = new Point(long2, lat2)
geodesicDistanceOnWGS84(p1, p2)
}
val maxDist = 200
var finalResult = Array[(Edges[(Long, Double)], (Long, (Double, Double, Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])])), (Long, (Double, Double, Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])])))]()
if(triplet.attr._2 > maxDist){
val wayId = triplet.attr._1
var wayNodesBuff = triplet.dstAttr._3(wayId)._1
var wayNodesBuffSize = wayNodesBuff.length
if(wayNodesBuffSize > 0){
var previousSrc = triplet.srcId
var distance: Double = 0.0
var currentBuff = Array[(Long, Double, Double)]()
distance += dist(triplet.srcAttr._1, triplet.srcAttr._2, wayNodesBuff(0)._2, wayNodesBuff(0)._3)
var newVertex = (triplet.srcId, triplet.srcAttr)
var previousVertex = newVertex
if (distance > maxDist){
newVertex = (wayNodesBuff(0)._1, (wayNodesBuff(0)._2, wayNodesBuff(0)._3, Map(wayId -> (Array[(Long, Double, Double)](), Array[(Long, Double, Double)]()))))
finalResult +:= (Edges(previousSrc, wayNodesBuff(0)._1, (wayId, distance)), previousVertex, newVertex)
previousVertex = newVertex
distance = 0
previousSrc = wayNodesBuff(0)._1
}
else
{
currentBuff +:= wayNodesBuff(0)
}
//loop through pairs of nodes in the way (in the buffer)
if (wayNodesBuff.length > 1){
wayNodesBuff.sliding(2).foreach{segment => {
val tmp_dst = distance
distance += dist(segment(0)._2, segment(0)._3, segment(1)._2, segment(1)._3)
if (distance > maxDist)
{
if(segment(0)._1 != previousSrc){
// Vertex(nodeId, (lat, long, Map(wayId->inBuff, outBuff)))
newVertex = (segment(0)._1, (segment(0)._2, segment(0)._3, Map(wayId -> (currentBuff, Array[(Long, Double, Double)]()))) )
//adds the edge to the array
finalResult +:= (Edges(previousSrc, segment(0)._1, (wayId, tmp_dst)), previousVertex, newVertex)
previousVertex = newVertex
distance -= tmp_dst
previousSrc = segment(0)._1
currentBuff = Array[(Long, Double, Double)]()
}
}
else
{
currentBuff +:= segment(0)
}
}}}
//from last node in the inBuff to the dst
val tmp_dist = distance
distance += dist(wayNodesBuff.last._2, wayNodesBuff.last._3, triplet.dstAttr._1, triplet.dstAttr._2)
if (distance > maxDist){
if (wayNodesBuff.last._1 != previousSrc){
newVertex = (wayNodesBuff.last._1, (wayNodesBuff.last._2, wayNodesBuff.last._3, Map(wayId -> (currentBuff, Array[(Long, Double, Double)]()))))
finalResult +:= (Edges(previousSrc, wayNodesBuff.last._1, (wayId, tmp_dist)), previousVertex, newVertex)
previousVertex = newVertex
distance -= tmp_dist
previousSrc = wayNodesBuff.last._1
currentBuff = Array[(Long, Double, Double)]()
newVertex = (triplet.dstId, (triplet.dstAttr._1, triplet.dstAttr._2, Map(wayId -> (currentBuff, triplet.dstAttr._3(wayId)._2))) )
}
}
finalResult +:= (Edges(previousSrc, triplet.dstId, (wayId, distance)), previousVertex, newVertex)
}
// Distance > threshold but no nodes in the way (buffer)
else
{
finalResult +:= (Edges(triplet.srcId, triplet.dstId, triplet.attr), (triplet.srcId, triplet.srcAttr), (triplet.dstId, triplet.dstAttr))
}
}
// Distance < threshold
else
{
finalResult +:= (Edges(triplet.srcId, triplet.dstId, triplet.attr), (triplet.srcId, triplet.srcAttr), (triplet.dstId, triplet.dstAttr))
}
// return
finalResult
}}
import org.apache.spark.graphx.{Edge=>Edges}
splittedEdges: org.apache.spark.rdd.RDD[(org.apache.spark.graphx.Edge[(Long, Double)], (Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])])), (Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])])))] = MapPartitionsRDD[776] at flatMap at command-588572986432369:2
// Taking each edge and its reverse
val segmentedEdges = splittedEdges.flatMap{case(edge, srcVertex, dstVertex) => Array(edge) ++ Array(Edges(edge.dstId, edge.srcId, edge.attr))}
segmentedEdges.count()
segmentedEdges: org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[(Long, Double)]] = MapPartitionsRDD[777] at flatMap at command-588572986432370:2
res70: Long = 16
segmentedEdges.take(36).foreach(println)
Edge(3963994985,25735257,(393182257,68.4570414333903))
Edge(25735257,3963994985,(393182257,68.4570414333903))
Edge(455006648,312353,(733389337,74.36517408391025))
Edge(312353,455006648,(733389337,74.36517408391025))
Edge(2187779764,25734373,(299906437,17.439956081003103))
Edge(25734373,2187779764,(299906437,17.439956081003103))
Edge(312353,2187779764,(299906437,83.2953837674163))
Edge(2187779764,312353,(299906437,83.2953837674163))
Edge(312363,25735257,(263934973,94.17321564547117))
Edge(25735257,312363,(263934973,94.17321564547117))
Edge(25735257,3067700641,(263934973,39.0782384063323))
Edge(3067700641,25735257,(263934973,39.0782384063323))
Edge(25734373,3431600977,(73834008,67.891710670905))
Edge(3431600977,25734373,(73834008,67.891710670905))
Edge(3067700641,2206536278,(302521477,82.6456450149808))
Edge(2206536278,3067700641,(302521477,82.6456450149808))
Edge(3067700668,312363,(263934971,5.743347106374985))
Edge(312363,3067700668,(263934971,5.743347106374985))
// Taking the individual vertices
val segmentedVertices = splittedEdges.flatMap{case(edge, srcVertex, dstVertex) => Array(srcVertex) ++ Array(dstVertex)}
segmentedVertices.map(node => node._1).distinct().take(16)
//25812013, 455006648, 25735257, 3431600977, 3963994985, 3067700641, 312353, 312363, 3067700668, 25734373, 2206536278) initial nodes
segmentedVertices: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = MapPartitionsRDD[778] at flatMap at command-588572986432372:2
res71: Array[Long] = Array(455006648, 25735257, 3431600977, 3963994985, 3067700641, 312353, 312363, 3067700668, 25734373, 2206536278)
// Converting the vertices to a df
val verticesDF = segmentedVertices.toDF("nodeId","attr").select($"nodeId",$"attr._1".as("lat"),$"attr._2".as("long"),explode($"attr._3"))
.withColumnRenamed("key","wayId").withColumnRenamed("value","buffers")
.select($"nodeId",$"lat",$"long",$"wayId",$"buffers._1".as("inBuff"),$"buffers._2".as("outBuff"))
verticesDF.show(24,false)
+----------+------------------+------------------+---------+-----------------------------------------------------------------------------------+----------------+
|nodeId |lat |long |wayId |inBuff |outBuff |
+----------+------------------+------------------+---------+-----------------------------------------------------------------------------------+----------------+
|3963994985|59.857381800000006|17.645299100000003|393182257|[] |[] |
|25735257 |59.8569759 |17.644382 |393182257|[] |[] |
|25735257 |59.8569759 |17.644382 |263934973|[[3067700665, 59.8575443, 17.6433633]] |[] |
|455006648 |59.857930700000004|17.6450031 |733389337|[] |[] |
|455006648 |59.857930700000004|17.6450031 |302521479|[[-1, 0.0, 0.0]] |[[-1, 0.0, 0.0]]|
|312353 |59.857437700000006|17.645897700000003|733389337|[[1523899738, 59.8575528, 17.645685500000003]] |[] |
|312353 |59.857437700000006|17.645897700000003|299906437|[] |[] |
|312353 |59.857437700000006|17.645897700000003|733389337|[[1523899738, 59.8575528, 17.645685500000003]] |[] |
|312353 |59.857437700000006|17.645897700000003|299906437|[] |[] |
|25734373 |59.8567674 |17.6471041 |299906437|[[801437007, 59.8571596, 17.6463952], [2187779764, 59.856883200000006, 17.6468947]]|[] |
|25734373 |59.8567674 |17.6471041 |73834008 |[] |[] |
|312363 |59.857601900000006|17.6432529 |263934973|[] |[] |
|312363 |59.857601900000006|17.6432529 |263934971|[] |[] |
|25735257 |59.8569759 |17.644382 |393182257|[] |[] |
|25735257 |59.8569759 |17.644382 |263934973|[[3067700665, 59.8575443, 17.6433633]] |[] |
|25735257 |59.8569759 |17.644382 |393182257|[] |[] |
|25735257 |59.8569759 |17.644382 |263934973|[[3067700665, 59.8575443, 17.6433633]] |[] |
|3067700641|59.856720800000005|17.6448606 |263934973|[] |[] |
|3067700641|59.856720800000005|17.6448606 |302521477|[] |[] |
|25734373 |59.8567674 |17.6471041 |299906437|[[801437007, 59.8571596, 17.6463952], [2187779764, 59.856883200000006, 17.6468947]]|[] |
|25734373 |59.8567674 |17.6471041 |73834008 |[] |[] |
|3431600977|59.85631480000001 |17.6479153 |73834008 |[[312352, 59.85636590000001, 17.6478229]] |[] |
|3067700641|59.856720800000005|17.6448606 |263934973|[] |[] |
|3067700641|59.856720800000005|17.6448606 |302521477|[] |[] |
+----------+------------------+------------------+---------+-----------------------------------------------------------------------------------+----------------+
only showing top 24 rows
verticesDF: org.apache.spark.sql.DataFrame = [nodeId: bigint, lat: double ... 4 more fields]
//unique wayIds of the edges
val nodesWayId = splittedEdges.map{case(edge, srcVertex, dstVertex) => edge.attr._1}.toDF("nodesWayId").dropDuplicates()
nodesWayId.show(10)
+----------+
|nodesWayId|
+----------+
| 393182257|
| 733389337|
| 299906437|
| 263934973|
| 73834008|
| 302521477|
| 263934971|
+----------+
nodesWayId: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [nodesWayId: bigint]
// Only vertices which have a wayId in their Map that is not included in any edge
// Dead end means there are no other intersection vertex in the way
val verticesWithDeadEndWays = verticesDF.join(nodesWayId, $"nodesWayId" === $"wayId", "leftanti") //leftanti is a special join which returns the rows that don't match
verticesWithDeadEndWays.show(20,false)
+---------+------------------+----------+---------+----------------+----------------+
|nodeId |lat |long |wayId |inBuff |outBuff |
+---------+------------------+----------+---------+----------------+----------------+
|455006648|59.857930700000004|17.6450031|302521479|[[-1, 0.0, 0.0]]|[[-1, 0.0, 0.0]]|
+---------+------------------+----------+---------+----------------+----------------+
verticesWithDeadEndWays: org.apache.spark.sql.DataFrame = [nodeId: bigint, lat: double ... 4 more fields]
//convert df to rdd to be joined later with the rest of the vertices
import scala.collection.mutable.WrappedArray
val verticesWithDeadEndWaysRDD = verticesWithDeadEndWays.rdd.map(row => (row.getLong(0),(row.getDouble(1),row.getDouble(2),Map(row.getLong(3)-> (row.getAs[WrappedArray[(Long, Double, Double)]](4).array,row.getAs[WrappedArray[(Long, Double, Double)]](5).array)))))
verticesWithDeadEndWaysRDD.take(10)
import scala.collection.mutable.WrappedArray
verticesWithDeadEndWaysRDD: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = MapPartitionsRDD[820] at map at command-588572986432376:3
res80: Array[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = Array((455006648,(59.857930700000004,17.6450031,Map(302521479 -> (Array([-1,0.0,0.0]),Array([-1,0.0,0.0]))))))
// for a node appearing in different ways, returns one vertex for each way
val verticesWithSharedWays = splittedEdges.flatMap{case(edge, srcVertex, dstVertex) =>
{
val srcVertex1 = (srcVertex._1,(srcVertex._2._1,srcVertex._2._2,Map(edge.attr._1 -> srcVertex._2._3(edge.attr._1))))
val dstVertex1 = (dstVertex._1,(dstVertex._2._1,dstVertex._2._2,Map(edge.attr._1 -> dstVertex._2._3(edge.attr._1))))
Array(srcVertex1) ++ Array(dstVertex1)
}}.distinct()
verticesWithSharedWays.take(10)
verticesWithSharedWays: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = MapPartitionsRDD[824] at distinct at command-588572986432377:8
res81: Array[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = Array((25735257,(59.8569759,17.644382,Map(393182257 -> (Array(),Array())))), (312363,(59.857601900000006,17.6432529,Map(263934971 -> (Array(),Array())))), (312353,(59.857437700000006,17.645897700000003,Map(299906437 -> (Array(),Array())))), (3963994985,(59.857381800000006,17.645299100000003,Map(393182257 -> (Array(),Array())))), (25735257,(59.8569759,17.644382,Map(263934973 -> (Array((3067700665,59.8575443,17.6433633)),Array())))), (3431600977,(59.85631480000001,17.6479153,Map(73834008 -> (Array((312352,59.85636590000001,17.6478229)),Array())))), (3067700641,(59.856720800000005,17.6448606,Map(302521477 -> (Array(),Array())))), (25734373,(59.8567674,17.6471041,Map(73834008 -> (Array(),Array())))), (2206536278,(59.85618040000001,17.6458707,Map(302521477 -> (Array((2206536285,59.8563708,17.645517400000003), (25734470,59.8562881,17.6456634)),Array())))), (3067700668,(59.857640200000006,17.6431843,Map(263934971 -> (Array(),Array())))))
//union of verticesWithDeadEndWaysRDD and verticesWithSharedWays and reduced adding the maps
val allVertices = verticesWithSharedWays.union(verticesWithDeadEndWaysRDD).reduceByKey((a,b) => (a._1, a._2, a._3 ++ b._3))
allVertices.count()
allVertices: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = ShuffledRDD[826] at reduceByKey at command-588572986432378:2
res82: Long = 10
import org.apache.spark.graphx.Graph
val segmentedGraph = Graph(allVertices, segmentedEdges).cache()
import org.apache.spark.graphx.Graph
segmentedGraph: org.apache.spark.graphx.Graph[(Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]),(Long, Double)] = org.apache.spark.graphx.impl.GraphImpl@2cb44552
//allVertices.map(vertex => (vertex._1,(vertex._2._1, vertex._2._2))).toDF("id","coordinates").write.mode("overwrite").parquet("dbfs:/graphs/uppsala/vertices")
// spark.read.parquet("dbfs:/graphs/uppsala/edges").rdd.take(1)
res88: Array[org.apache.spark.sql.Row] = Array([2187779764,25734373,[299906437,17.439956081003103]])
segmentedGraph.vertices.take(11)
res33: Array[(org.apache.spark.graphx.VertexId, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = Array((25735257,(59.8569759,17.644382,Map(263934973 -> (Array((3067700665,59.8575443,17.6433633)),Array()), 393182257 -> (Array(),Array())))), (2187779764,(59.856883200000006,17.6468947,Map(299906437 -> (Array((801437007,59.8571596,17.6463952), (801437007,59.8571596,17.6463952)),Array())))), (3431600977,(59.85631480000001,17.6479153,Map(73834008 -> (Array((312352,59.85636590000001,17.6478229)),Array())))), (3963994985,(59.857381800000006,17.645299100000003,Map(393182257 -> (Array(),Array())))), (3067700641,(59.856720800000005,17.6448606,Map(263934973 -> (Array(),Array()), 302521477 -> (Array(),Array())))), (3067700668,(59.857640200000006,17.6431843,Map(263934971 -> (Array(),Array())))), (2206536278,(59.85618040000001,17.6458707,Map(302521477 -> (Array((2206536285,59.8563708,17.645517400000003), (25734470,59.8562881,17.6456634)),Array())))), (455006648,(59.857930700000004,17.6450031,Map(733389337 -> (Array(),Array()), 302521479 -> (Array([-1,0.0,0.0]),Array([-1,0.0,0.0]))))), (312353,(59.857437700000006,17.645897700000003,Map(733389337 -> (Array((1523899738,59.8575528,17.645685500000003)),Array()), 299906437 -> (Array(),Array())))), (312363,(59.857601900000006,17.6432529,Map(263934973 -> (Array(),Array()), 263934971 -> (Array(),Array())))), (25734373,(59.8567674,17.6471041,Map(73834008 -> (Array(),Array()), 299906437 -> (Array(),Array())))))
segmentedGraph.edges.count
res34: Long = 18
segmentedGraph.edges.take(18).foreach(println)
Edge(25735257,3963994985,(393182257,68.4570414333903))
Edge(3963994985,25735257,(393182257,68.4570414333903))
Edge(312353,455006648,(733389337,74.36517408391025))
Edge(455006648,312353,(733389337,74.36517408391025))
Edge(312353,2187779764,(299906437,83.2953837674163))
Edge(25734373,2187779764,(299906437,17.439956081003103))
Edge(2187779764,312353,(299906437,83.2953837674163))
Edge(2187779764,25734373,(299906437,17.439956081003103))
Edge(312363,25735257,(263934973,94.17321564547117))
Edge(25735257,312363,(263934973,94.17321564547117))
Edge(25735257,3067700641,(263934973,39.0782384063323))
Edge(3067700641,25735257,(263934973,39.0782384063323))
Edge(25734373,3431600977,(73834008,67.891710670905))
Edge(3431600977,25734373,(73834008,67.891710670905))
Edge(2206536278,3067700641,(302521477,82.6456450149808))
Edge(3067700641,2206536278,(302521477,82.6456450149808))
Edge(312363,3067700668,(263934971,5.743347106374985))
Edge(3067700668,312363,(263934971,5.743347106374985))
val G1 = segmentedGraph.edges.toDF().select($"srcId".as("src"), $"dstId".as("dest"), lit(1L).as("count"))
d3.graphs.force(
height = 1000,
width = 1000,
clicks = G1.as[d3.Edge])
Creating a road graph from OpenStreetMap (OSM) data with GraphX
Stavroula Rafailia Vlachou (LinkedIn), Virginia Jimenez Mohedano (LinkedIn) and Raazesh Sainudiin (LinkedIn).
This project was supported by SENSMETRY through a Data Science Project Internship
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou and Virginia J. Mohedano
and Databricks University Alliance with infrastructure credits from AWS to
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.
2022, Uppsala, Sweden
This project builds on top of the work of Dillon George (2016-2018).
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Download the road network representation of Lithuania through OSM data distributed from GeoFabrik https://download.geofabrik.de/europe/lithuania.html
curl -O https://download.geofabrik.de/europe/lithuania-latest.osm.pbf
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 155M 0 512k 0 0 906k 0 0:02:55 --:--:-- 0:02:55 906k
19 155M 19 30.4M 0 0 19.5M 0 0:00:07 0:00:01 0:00:06 19.5M
41 155M 41 64.1M 0 0 25.0M 0 0:00:06 0:00:02 0:00:04 25.0M
65 155M 65 101M 0 0 28.5M 0 0:00:05 0:00:03 0:00:02 28.5M
92 155M 92 143M 0 0 31.5M 0 0:00:04 0:00:04 --:--:-- 31.4M
100 155M 100 155M 0 0 32.1M 0 0:00:04 0:00:04 --:--:-- 36.3M
dbutils.fs.mv("file:/databricks/driver/lithuania-latest.osm.pbf", "dbfs:/datasets/osm/lithuania/lithuania.osm.pbf")
res6: Boolean = true
import crosby.binary.osmosis.OsmosisReader
import org.apache.hadoop.mapreduce.{TaskAttemptContext, JobContext}
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.openstreetmap.osmosis.core.container.v0_6.EntityContainer
import org.openstreetmap.osmosis.core.domain.v0_6._
import org.openstreetmap.osmosis.core.task.v0_6.Sink
import sqlContext.implicits._
import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.Map
import scala.collection.JavaConversions._
import org.apache.spark.sql.functions._
import org.apache.spark.graphx._
import crosby.binary.osmosis.OsmosisReader
import org.apache.hadoop.mapreduce.{TaskAttemptContext, JobContext}
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.openstreetmap.osmosis.core.container.v0_6.EntityContainer
import org.openstreetmap.osmosis.core.domain.v0_6._
import org.openstreetmap.osmosis.core.task.v0_6.Sink
import sqlContext.implicits._
import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.Map
import scala.collection.JavaConversions._
import org.apache.spark.sql.functions._
import org.apache.spark.graphx._
- For the ingestion of the entire OSM Lithuanian road network dataset, the PBF file obtained from OSM is transformed to three parquet files; one for each primitive (nodes, ways and relations), by utilising methods of the osm-parquetizer project. The first two generated files, corresponding to the nodes and ways are then transferred into the distributed file system for further exploitation.
Install the osm-parquetizer in the cluster
Clone the repository from osm-parquetizer project and build the library that will be updated to the cluster.
Later, it will be used to load the osm data faster.
//Run this command only once per cluster
%sh
java -jar /dbfs/FileStore/jars/2706d711_3963_4d88_92e7_a8870d0164d1-osm_parquetizer_1_0_1_SNAPSHOT-80d25.jar /dbfs/datasets/osm/lithuania/lithuania.osm.pbf
2022-04-06 07:40:54 INFO CodecPool:153 - Got brand-new compressor [.snappy]
2022-04-06 07:40:55 INFO CodecPool:153 - Got brand-new compressor [.snappy]
2022-04-06 07:40:55 INFO CodecPool:153 - Got brand-new compressor [.snappy]
2022-04-06 07:40:58 INFO App$MultiEntitySinkObserver:118 - Entities processed: 1000000
2022-04-06 07:40:59 INFO App$MultiEntitySinkObserver:118 - Entities processed: 2000000
2022-04-06 07:41:00 INFO App$MultiEntitySinkObserver:118 - Entities processed: 3000000
2022-04-06 07:41:02 INFO App$MultiEntitySinkObserver:118 - Entities processed: 4000000
2022-04-06 07:41:03 INFO App$MultiEntitySinkObserver:118 - Entities processed: 5000000
2022-04-06 07:41:04 INFO App$MultiEntitySinkObserver:118 - Entities processed: 6000000
2022-04-06 07:41:11 INFO App$MultiEntitySinkObserver:118 - Entities processed: 7000000
2022-04-06 07:41:12 INFO App$MultiEntitySinkObserver:118 - Entities processed: 8000000
2022-04-06 07:41:13 INFO App$MultiEntitySinkObserver:118 - Entities processed: 9000000
2022-04-06 07:41:14 INFO App$MultiEntitySinkObserver:118 - Entities processed: 10000000
2022-04-06 07:41:15 INFO App$MultiEntitySinkObserver:118 - Entities processed: 11000000
2022-04-06 07:41:16 INFO App$MultiEntitySinkObserver:118 - Entities processed: 12000000
2022-04-06 07:41:23 INFO App$MultiEntitySinkObserver:118 - Entities processed: 13000000
2022-04-06 07:41:24 INFO App$MultiEntitySinkObserver:118 - Entities processed: 14000000
2022-04-06 07:41:25 INFO App$MultiEntitySinkObserver:118 - Entities processed: 15000000
2022-04-06 07:41:25 INFO App$MultiEntitySinkObserver:118 - Entities processed: 16000000
2022-04-06 07:41:27 INFO App$MultiEntitySinkObserver:118 - Entities processed: 17000000
2022-04-06 07:41:28 INFO App$MultiEntitySinkObserver:118 - Entities processed: 18000000
2022-04-06 07:41:29 INFO App$MultiEntitySinkObserver:118 - Entities processed: 19000000
2022-04-06 07:41:36 INFO App$MultiEntitySinkObserver:118 - Entities processed: 20000000
2022-04-06 07:41:37 INFO App$MultiEntitySinkObserver:118 - Entities processed: 21000000
2022-04-06 07:41:43 INFO App$MultiEntitySinkObserver:118 - Entities processed: 22000000
2022-04-06 07:41:48 INFO App$MultiEntitySinkObserver:118 - Entities processed: 23000000
2022-04-06 07:42:01 INFO App$MultiEntitySinkObserver:125 - Total entities processed: 23727209
ls /dbfs/datasets/osm/lithuania/
lithuania.osm.pbf
lithuania.osm.pbf.node.parquet
lithuania.osm.pbf.relation.parquet
lithuania.osm.pbf.way.parquet
Read the parquet files of the nodes and ways obtained from the osm-parquetizer.
spark.conf.set("spark.sql.parquet.binaryAsString", true)
val nodes_df = spark.read.parquet("dbfs:/datasets/osm/lithuania/lithuania.osm.pbf.node.parquet")
val ways_df = spark.read.parquet("dbfs:/datasets/osm/lithuania/lithuania.osm.pbf.way.parquet")
nodes_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 7 more fields]
ways_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 6 more fields]
- The list of tags chosen for this work. For the semantic meaning of each tag see the OSM description. The list is non exhaustive and should be adapted according to the desired granulatiry of and level of detail of the project at hand.
val allowableWays = Seq(
"motorway",
"motorway_link",
"trunk",
"trunk_link",
"primary",
"primary_link",
"secondary",
"secondary_link",
"tertiary",
"tertiary_link",
"living_street",
"residential",
"road",
"construction",
"motorway_junction"
)
allowableWays: Seq[String] = List(motorway, motorway_link, trunk, trunk_link, primary, primary_link, secondary, secondary_link, tertiary, tertiary_link, living_street, residential, road, construction, motorway_junction)
//convert the nodes to Dataset containing the fields of interest
case class NodeEntry(nodeId: Long, latitude: Double, longitude: Double, tags: Seq[String])
val nodeDS = nodes_df.map(node =>
NodeEntry(node.getAs[Long]("id"),
node.getAs[Double]("latitude"),
node.getAs[Double]("longitude"),
node.getAs[Seq[Row]]("tags").map{case Row(key:String, value:String) => value}
)).cache()
defined class NodeEntry
nodeDS: org.apache.spark.sql.Dataset[NodeEntry] = [nodeId: bigint, latitude: double ... 2 more fields]
nodeDS.count()
res2: Long = 21212155
//convert the ways to Dataset containing the fields of interest
case class WayEntry(wayId: Long, tags: Array[String], nodes: Array[Long])
val wayDS = ways_df.flatMap(way => {
val tagSet = way.getAs[Seq[Row]]("tags").map{case Row(key:String, value:String) => value}.toArray
if (tagSet.intersect(allowableWays).nonEmpty ){
Array(WayEntry(way.getAs[Long]("id"),
tagSet,
way.getAs[Seq[Row]]("nodes").map{case Row(index:Integer, nodeId:Long) => nodeId}.toArray
))
}
else { Array[WayEntry]()}
}
).cache()
defined class WayEntry
wayDS: org.apache.spark.sql.Dataset[WayEntry] = [wayId: bigint, tags: array<string> ... 1 more field]
wayDS.count()
res4: Long = 137540
val nodeCounts = wayDS
.select(explode('nodes).as("node"))
.groupBy('node).count
nodeCounts: org.apache.spark.sql.DataFrame = [node: bigint, count: bigint]
- An intersection node is defined here as a node that lies in at least two ways.
val intersectionNodes = nodeCounts.filter('count >= 2).select('node.alias("intersectionNode"))
val true_intersections = intersectionNodes
intersectionNodes: org.apache.spark.sql.DataFrame = [intersectionNode: bigint]
true_intersections: org.apache.spark.sql.DataFrame = [intersectionNode: bigint]
intersectionNodes.count()
res8: Long = 162325
val distinctNodesWays = wayDS.flatMap(_.nodes).distinct //the distinct nodes within the ways
distinctNodesWays: org.apache.spark.sql.Dataset[Long] = [value: bigint]
distinctNodesWays.count()
res10: Long = 1299907
val wayNodes = nodeDS.as("nodes")
.joinWith(distinctNodesWays.as("ways"), $"ways.value" === $"nodes.nodeId")
.map(_._1).cache
wayNodes: org.apache.spark.sql.Dataset[NodeEntry] = [nodeId: bigint, latitude: double ... 2 more fields]
wayNodes.count()
res12: Long = 1299907
val intersectionSetVal = intersectionNodes.as[Long].collect.toSet; //turn intersectionNodes to Set
intersectionSetVal: scala.collection.immutable.Set[Long] = Set(3954894392, 1028098141, 8327933356, 1192596601, 1036402120, 5840172474, 691993192, 7280204168, 3837546128, 1509692779, 3774745375, 2888929887, 3882298102, 4456063981, 1812836277, 6219174203, 1132762870, 2704534617, 1036358572, 1314515551, 5887601785, 3472814007, 935011580, 2266417234, 2218477159, 3830971192, 3758026612, 2628269378, 2450295578, 2036730950, 4014928315, 4047561472, 3742211751, 417473667, 710972352, 1240304711, 2344640802, 3175136574, 3610788315, 1152426347, 3843702680, 2135301596, 3463371091, 2578259945, 2272646209, 9288252126, 8659906497, 5046236674, 3882606462, 6853150636, 2348202899, 1827020895, 1034351953, 2872587837, 7598921441, 4441135707, 7154408678, 2143313902, 6358524504, 1827841626, 51401434, 2104687370, 3169288908, 2203661858, 509277213, 7398865298, 2706131803, 7020673974, 2482655992, 410873070, 40599892, 2718581564, 1136446055, 2612123258, 5856761891, 896143820, 1723158680, 3692175721, 7973969958, 2596488268, 2746044544, 1145714624, 1057404723, 412963083, 81203920, 1258193303, 7277561125, 5215875721, 9119375173, 2095081588, 1017873867, 1151243019, 1848119391, 1924034959, 277888047, 3124645299, 3796300978, 34825612, 2234211037, 2378775918, 2533534558, 3387536812, 262278719, 3539046584, 2600271017, 2343507669, 6198589614, 798855679, 2955786090, 31452099, 2255897649, 4069602981, 5821435649, 8510338674, 9118314553, 727235490, 1632026970, 1138846538, 7817950226, 9500011302, 2491378894, 2659296775, 6510669593, 2245343559, 1549190307, 4723634502, 5975664368, 834528382, 1144264294, 3398600261, 2934302676, 2620066696, 2512528358, 1026705634, 3846262838, 4944746627, 4475382645, 2045995025, 2043449667, 2800589982, 6562241076, 6466080351, 1639532336, 8006034862, 4572083119, 4102652469, 2135301330, 1848200775, 2725675664, 321982547, 2379130060, 9236572852, 1834676531, 7342960659, 3481040164, 3773275282, 4723676068, 4508131633, 4426839331, 2419523846, 7279732924, 1156860008, 3591788118, 1946671545, 1636124896, 3492717581, 4949411117, 1044390922, 6845662470, 371663507, 3385128084, 5962770335, 1242544881, 457526430, 981428783, 4961114540, 2262631506, 2297448445, 9603285842, 1474540484, 5940837836, 1700351712, 2320635099, 2146958637, 9270519896, 4872990619, 2928092139, 4425066655, 2206664581, 7280235988, 1535287197, 1183618876, 6485160551, 4411398966, 3991141456, 1628728176, 4889396284, 4759399991, 3946596452, 2229195631, 6327513918, 2033129308, 2585464548, 1800850400, 1104097855, 1801836841, 5543206580, 1733360477, 2192183580, 1286185458, 1039741678, 3071904626, 1479129180, 8159625958, 8153576878, 137356770, 9512132640, 2889406310, 9282951374, 2772945378, 7087700888, 299690975, 7496589529, 363422207, 1258118530, 3717631194, 2769080208, 873542203, 6203485294, 2213415589, 1826893692, 1295860361, 1991899638, 7142870533, 2636482814, 2228819220, 2486409125, 1408251311, 1163019147, 1986264613, 722809386, 1032791795, 2372253123, 7280236132, 3012402670, 410410926, 1632013830, 4185213749, 8422801035, 5082283666, 8390975598, 717553500, 8903760367, 1426181723, 2486831606, 2706132069, 7082082762, 3629722595, 1827841682, 1663312625, 1116078354, 809918023, 1634421784, 2294078407, 6371632243, 6942832214, 3103437026, 4778848191, 5609948419, 928079952, 2643946553, 2219142846, 3259798128, 972705422, 4532894711, 1147389163, 4200665722, 1621450716, 3446435974, 3508982006, 2914931552, 267993651, 8609633816, 1583332389, 8437833869, 3954868130, 2844494775, 307436405, 9537310383, 1011788287, 1218743199, 289957295, 1751074538, 1156335745, 1146179866, 3281587820, 8742132605, 7194943159, 730037695, 2210300341, 4067081255, 2120972958, 2431147071, 1822938498, 6538086425, 4002602910, 5353103914, 4983226891, 4213376931, 7637618131, 289974999, 897017041, 3784378983, 863076967, 2548257942, 664083103, 2107860391, 1933944950, 316984109, 7234677785, 2078466041, 1643397002, 1947951854, 1022668729, 3923653099, 430592144, 9526786026, 1647829111, 4350345841, 8299422853, 1388346027, 1333297066, 5713163211, 1610127502, 1788952448, 2458903268, 3791809345, 948084432, 1316591108, 31451994, 3143625714, 2760666691, 3828309401, 2597263264, 3266940454, 1316448033, 440186291, 8180441284, 420327660, 509915470, 2514797704, 3780311361, 8559264289, 3394551775, 3212336035, 6679942853, 4709809401, 3653215774, 1583370181, 987421444, 1833211352, 1144264023, 6630983072, 3378700352, 59966924, 2349179912, 7262854702, 8467921475, 3440462099, 2575696642, 3026960820, 1218743216, 3027688464, 2229173598, 3410180667, 6556729270, 2291514844, 3593937592, 258334236, 3014185666, 4413684252, 3110697588, 2241312126, 3730438791, 5936924029, 3751286821, 9026315361, 2338538874, 3060553050, 2141652614, 5875853605, 3784808168, 9465276921, 2403703365, 1583370191, 7077003935, 1026388850, 7194944311, 5859419005, 2473521017, 3842318629, 3376486271, 802900320, 8252803189, 2208380506, 1156333576, 4483834287, 270417965, 7237069225, 360358257, 2210300378, 2420411103, 4535542163, 6216923884, 2051653884, 2636348593, 1472252644, 4993989546, 4109866667, 828100084, 1765999049, 2718581586, 8301377775, 1698483524, 7363586219, 2218608237, 2291119222, 3167440934, 2651981227, 1584052082, 4054735777, 309903458, 8603152470, 9476909282, 1409939961, 4301868799, 59597223, 2189861315, 267993634, 2929972394, 9225032695, 135506977, 2798742686, 975863053, 5311900085, 3488545429, 875843248, 2183169598, 34831827, 1838097699, 9374557330, 2977367244, 9431428140, 4843262174, 2338503812, 8865734959, 417459235, 3721512253, 2782161816, 1991225594, 4458953260, 5801770199, 3267338755, 4020271732, 2221734300, 2527141662, 7182319546, 1184532267, 3731435296, 1492744502, 1798375955, 3077206333, 519329015, 3729938977, 1682401466, 2291847466, 6964491980, 297195057, 3917165782, 5941024645, 3739153439, 3001058738, 5958222327, 8314763501, 1066074385, 316413715, 2278156350, 5347857056, 1666690939, 4183327417, 3856244360, 2064604400, 4250281533, 2060142472, 2403235372, 32841598, 2386188237, 1097991668, 2183690668, 2630516839, 928079854, 6065705344, 307761081, 6353759494, 5353548326, 3404472949, 5781659914, 9498221622, 1425162913, 3577210444, 7279644065, 2225499506, 983975927, 9425700741, 1784462203, 1518862773, 5731650357, 460421744, 1057745021, 1869057080, 3835591790, 9556653974, 3287551559, 1388523717, 4895686278, 1669536064, 6679519540, 863767755, 279031554, 1675549394, 2218477211, 3733218337, 33696568, 2304694881, 1628815710, 2865789132, 3784627841, 2291591139, 4428630254, 1399469264, 82665220, 1184706242, 1132762659, 2249967109, 2206365893, 2132763286, 4011368562, 1376095268, 2204877189, 4548183278, 5496029012, 1869298008, 1362530886, 5743781594, 983988377, 9376200974, 1406174749, 7924188545, 7057373803, 5109656203, 6565466018, 6176044211, 3012400575, 4937071805, 2883700859, 2599495143, 2717130199, 3766909365, 5713162376, 1194658790, 5737707544, 307434796, 1232671903, 2636315241, 8259520715, 2305314281, 1622596513, 387188500, 5810398683, 3013782338, 8542643608, 1136279036, 3001000032, 1414401838, 1679739780, 196452763, 2184428808, 8480819385, 2162630967, 5737707289, 833173029, 9594321488, 1283445383, 918958651, 875887014, 957483875, 2598507082, 2098286970, 1834140639, 3394367897, 270955266, 2396052669, 2635681054, 2557177227, 738262392, 3852199229, 7253699761, 4014087118, 4772352886, 1317359171, 2457760635, 2081277045, 477279781, 6565459647, 994420612, 4103058760, 2206318153, 8609775508, 1408174608, 853085910, 1201326444, 969236676, 2234945974, 3385153797, 8807447960, 2667148223, 1717012137, 2392972139, 7179029668, 8354869319, 2571067773, 938485933, 6737257768, 5650317935, 1800873402, 5737707311, 6860929497, 2107860313, 3747571216, 3274638558, 882543449, 8148693091, 428894557, 33351960, 4834093689, 4319213787, 2199942142, 956228765, 4843283834, 774449468, 2278776107, 8942677479, 9245138140, 4382734481, 1582432158, 1369282096, 983978646, 1073909045, 1834040946, 759821178, 2609163849, 1119737116, 2354970285, 3387522495, 2272646056, 1026388570, 5958221291, 845238880, 5827167555, 4781018322, 7296351534, 2882366814, 8408590758, 2720703388, 877483437, 1786405258, 3307278453, 3751405818, 7106436389, 2148903330, 5494802007, 3555422221, 60178273, 1099711886, 2585151355, 3974900096, 8428736128, 7954360226, 7598764758, 4375255572, 1240307170, 2205035462, 3714158040, 1316590749, 3001059117, 1562239376, 7176474600, 2352440636, 692905899, 1020477653, 4503154089, 5018686059, 1146179488, 6205024586, 2372252431, 1572488857, 468212277, 32444415, 841684817, 4994858487, 2484703166, 60735101, 416868242, 1473000644, 1106088085, 1833573595, 2108526263, 6794671031, 2331881331, 1304629131, 3583899803, 2314813365, 417459748, 307436400, 130184820, 1478353614, 6150026276, 5854494346, 613391647, 1789002993, 2093484483, 3883005032, 3651396832, 672383763, 7661640286, 2206307106, 672882902, 672383810, 2794473788, 628916719, 3351527927, 363415410, 2447375338, 1045091692, 1071180106, 2206664533, 509915492, 774102472, 2427890952, 1985569687, 1043674973, 2622769778, 1028097980, 7672655414, 903875660, 448106871, 1242375740, 3406764205, 1426181269, 2047701609, 2327214518, 32137407, 2565898533, 1378715716, 2391162479, 7254849968, 3105466444, 334074652, 3766872158, 3394225486, 7220822425, 1825537562, 2213482293, 213103643, 3762598178, 2299315684, 4149587762, 1190924291, 3771852909, 2249019343, 2234997832, 1234754014, 1436522954, 3539132652, 8777124688, 444382405, 3991092613, 5849683848, 3458040974, 1640691492, 1305960244, 8108544369, 2578259841, 6231031628, 3097531642, 4015318844, 1367988073, 2394951419, 984017310, 1584852514, 8486499516, 1430930928, 1425162944, 4292888171, 1821448220, 4007267980, 5132586877, 7065792490, 32070346, 1625011580, 1639391465, 2379921223, 2926434071, 1156355235, 1874799141, 1229897058, 2277634548, 2206390743, 5075625954, 3659517817, 2216244891, 1747606437, 669898636, 6298083892, 1962417346, 2156378184, 7255498135, 9309540804, 1156337096, 3783267692, 2706118851, 2218679650, 2457780505, 2436488867, 9248641052, 4419542059, 2579616352, 8655173923, 3784465264, 3660955552, 967978729, 6718424372, 1599651948, 3076644627, 894241402, 2405003030, 2425004096, 838667615, 747170701, 2364173809, 509915338, 2079707910, 1367982120, 2592464646, 423134683, 7704615140, 360346943, 1093565436, 2346919436, 8078819601, 8479967689, 1139668748, 2486831545, 1323305098, 8974061782, 3255347567, 7695291822, 834596921, 6221589470, 3086066032, 364862229, 4014880532, 3385093676, 4147558019, 32325860, 59602095, 8914071661, 9205403028, 1091178299, 1640421992, 3235859832, 8553407645, 2088182463, 1066074463, 1144263463, 9399410933, 3614818103, 1037021065, 2339350243, 2234992547, 482372352, 8365607345, 9407899787, 2503238675, 4052869743, 4057398263, 2752337390, 6022389760, 3757229740, 3664167280, 1109786253, 9420175990, 1639229904, 4936629565, 7194943988, 948085015, 1859172543, 1037238437, 1475777575, 2378760997, 2202910372, 2250873936, 4059537745, 34823041, 833756825, 926087368, 2301065976, 8664365690, 5733289457, 2225499406, 316030135, 4485856605, 4183327389, 39720895, 7762319766, 1786533553, 3773213186, 4096000552, 3583815488, 8904198808, 4002388889, 1827843225, 1151245225, 1633730199, 1711891073, 1818530578, 4015319010, 1043674554, 9277109190, 1305078052, 1070210060, 993790515, 2293770972, 3075151948, 3001199427, 87871140, 8221996479, 1811038357, 8628592241, 3173095658, 905162047, 2163102239, 1068906801, 4428831607, 2245344052, 2299428801, 5466134779, 2044841002, 5037480685, 3783505652, 1955251123, 1568102895, 4100065051, 3923639220, 285961845, 3256252817, 5102966903, 2225499382, 2535189315, 1082645551, 2723635886, 2266375430, 707871678, 2519920004, 475246698, 9172574938, 1399469459, 1344873524, 5707764259, 2683347689, 1642173525, 846815313, 4934403402, 1153785449, 2771898228, 7248424922, 1566236272, 912155364, 7923015520, 1069856530, 1332023401, 5739790119, 1771110833, 367155036, 1250349086, 3456840005, 9213970172, 1072204527, 1421435056, 270904459)
import org.apache.spark.sql.functions.{collect_list, map, udf}
import org.apache.spark.sql.functions._
val remove_first_and_last = udf((x: Seq[Long]) => x.drop(1).dropRight(1))
val nodes = wayDS.
select($"wayId", $"nodes").
withColumn("node", explode($"nodes")).
drop("nodes")
val get_first_and_last = udf((x: Seq[Long]) => {val first = x(0); val last = x.reverse(0); Array(first, last)})
val first_and_last_nodes = wayDS.
select($"wayId", get_first_and_last($"nodes").as("nodes")).
withColumn("node", explode($"nodes")).
drop("nodes")
val dead_end_points = first_and_last_nodes.select($"node").distinct().withColumnRenamed("node", "value")
// Turn intersection set into a dataset to join (all values must be unique)
val intersections = intersectionNodes.union(dead_end_points).distinct
val wayNodesLocated = nodes.join(wayNodes, wayNodes.col("nodeId") === nodes.col("node")).select($"wayId", $"node", $"latitude", $"longitude")
case class MappedWay(wayId: Long, labels_located: Seq[Map[Long, (Boolean, Double, Double)]])
val maps = wayNodesLocated.join(intersections, 'node === 'intersectionNode, "left_outer").
//left outer joins returns all rows from the left DataFrame/Dataset regardless of match found on the right dataset
select($"wayId", $"node", $"intersectionNode".isNotNull.as("contains"), $"latitude", $"longitude").
groupBy("wayId").agg(collect_list(map($"node", struct($"contains".as("_1"), $"latitude".as("_2"), $"longitude".as("_3")))).as("labels_located")).as[MappedWay]
val combine = udf((nodes: Seq[Long], labels_located: Seq[scala.collection.immutable.Map[Long, (Boolean, Double, Double)]]) => {
// If labels does not have "node", then it is either start/end - we assign label = true, latitude = 0, longitude = 0 for it, TO DO: revise it later, not sure
val m = labels_located.map(_.toSeq).flatten.toMap
nodes.map { node => (node, m.getOrElse(node, (true, 0D, 0D))) } //add structure
})
val strSchema = "array<struct<nodeId:long, nodeInfo:struct<label:boolean, latitude:double, longitude: double>>>"
val labeledWays = wayDS.join(maps, "wayId")
.select($"wayId", $"tags", combine($"nodes", $"labels_located").as("labeledNodes").cast(strSchema))
import org.apache.spark.sql.functions.{collect_list, map, udf}
import org.apache.spark.sql.functions._
remove_first_and_last: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,ArrayType(LongType,false),Some(List(ArrayType(LongType,false))))
nodes: org.apache.spark.sql.DataFrame = [wayId: bigint, node: bigint]
get_first_and_last: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,ArrayType(LongType,false),Some(List(ArrayType(LongType,false))))
first_and_last_nodes: org.apache.spark.sql.DataFrame = [wayId: bigint, node: bigint]
dead_end_points: org.apache.spark.sql.DataFrame = [value: bigint]
intersections: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [intersectionNode: bigint]
wayNodesLocated: org.apache.spark.sql.DataFrame = [wayId: bigint, node: bigint ... 2 more fields]
defined class MappedWay
maps: org.apache.spark.sql.Dataset[MappedWay] = [wayId: bigint, labels_located: array<map<bigint,struct<_1:boolean,_2:double,_3:double>>>]
combine: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,ArrayType(StructType(StructField(_1,LongType,false), StructField(_2,StructType(StructField(_1,BooleanType,false), StructField(_2,DoubleType,false), StructField(_3,DoubleType,false)),true)),true),Some(List(ArrayType(LongType,false), ArrayType(MapType(LongType,StructType(StructField(_1,BooleanType,false), StructField(_2,DoubleType,false), StructField(_3,DoubleType,false)),true),true))))
strSchema: String = array<struct<nodeId:long, nodeInfo:struct<label:boolean, latitude:double, longitude: double>>>
labeledWays: org.apache.spark.sql.DataFrame = [wayId: bigint, tags: array<string> ... 1 more field]
case class Intersection(OSMId: Long , latitude: Double, longitude: Double, inBuf: ArrayBuffer[(Long, Double, Double)], outBuf: ArrayBuffer[(Long, Double, Double)])
val segmentedWays = labeledWays.map(way => {
val labeledNodes = way.getAs[Seq[Row]]("labeledNodes").map{case Row(k: Long, Row(v: Boolean, w:Double, x:Double)) => (k, v,w,x)}.toSeq //labeledNodes: (nodeid, label, lat, long)
val wayId = way.getAs[Long]("wayId")
val indexedNodes: Seq[((Long, Boolean, Double, Double), Int)] = labeledNodes.zipWithIndex //appends an integer as an index to every labeledNodes in a way
val intersections = ArrayBuffer[Intersection]()
val currentBuffer = ArrayBuffer[(Long, Double, Double)]()
val way_length = labeledNodes.length //number of nodes in a way
if (way_length == 1) {
val intersect = new Intersection(labeledNodes(0)._1, labeledNodes(0)._3, labeledNodes(0)._4, ArrayBuffer((-1L, 0D, 0D)), ArrayBuffer((-1L, 0D, 0D))) //include lat and long info
var result = Array((intersect.OSMId, intersect.latitude, intersect.longitude, intersect.inBuf.toArray, intersect.outBuf.toArray))
(wayId, result) //return
}
else {
indexedNodes.foreach{ case ((id, isIntersection, latitude, longitude), i) => // id is nodeId and isIntersection is the node's boolean label
if (isIntersection) {
val newEntry = new Intersection(id, latitude, longitude, currentBuffer.clone, ArrayBuffer[(Long, Double, Double)]())
intersections += newEntry
currentBuffer.clear
}
else {
currentBuffer ++= Array((id, latitude, longitude)) //if the node is not an intersection append the nodeId to the current buffer
}
// Reaches the end of the way while the outBuffer is not empty
// Append the currentBuffer to the last existing intersection
if (i == way_length - 1 && !currentBuffer.isEmpty) {
if (intersections.isEmpty){
intersections += new Intersection(-1, 0D, 0D, currentBuffer, ArrayBuffer[(Long, Double, Double)]())
}
else {
intersections.last.outBuf ++= currentBuffer
}
currentBuffer.clear
}
}
var result = intersections.map(i => (i.OSMId, i.latitude, i.longitude, i.inBuf.toArray, i.outBuf.toArray)).toArray
(wayId, result)
}
})
defined class Intersection
segmentedWays: org.apache.spark.sql.Dataset[(Long, Array[(Long, Double, Double, Array[(Long, Double, Double)], Array[(Long, Double, Double)])])] = [_1: bigint, _2: array<struct<_1:bigint,_2:double,_3:double,_4:array<struct<_1:bigint,_2:double,_3:double>>,_5:array<struct<_1:bigint,_2:double,_3:double>>>>]
val schema = "array<struct<nodeId:bigint,latitude:double,longitude:double,inBuff:array<struct<nodeId:bigint,latitude:double,longitude:double>>,outBuff:array<struct<nodeId:bigint,latitude:double,longitude:double>>>>"
segmentedWays.select($"_1".alias("wayId"), $"_2".cast(schema).alias("nodeInfo")).printSchema()
root
|-- wayId: long (nullable = false)
|-- nodeInfo: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- nodeId: long (nullable = true)
| | |-- latitude: double (nullable = true)
| | |-- longitude: double (nullable = true)
| | |-- inBuff: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- nodeId: long (nullable = true)
| | | | |-- latitude: double (nullable = true)
| | | | |-- longitude: double (nullable = true)
| | |-- outBuff: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- nodeId: long (nullable = true)
| | | | |-- latitude: double (nullable = true)
| | | | |-- longitude: double (nullable = true)
schema: String = array<struct<nodeId:bigint,latitude:double,longitude:double,inBuff:array<struct<nodeId:bigint,latitude:double,longitude:double>>,outBuff:array<struct<nodeId:bigint,latitude:double,longitude:double>>>>
//Unwrap the nested structure of the segmentedWays
val waySegmentDS = segmentedWays.flatMap(way => way._2.map(node => (way._1, node)))
waySegmentDS: org.apache.spark.sql.Dataset[(Long, (Long, Double, Double, Array[(Long, Double, Double)], Array[(Long, Double, Double)]))] = [_1: bigint, _2: struct<_1: bigint, _2: double ... 3 more fields>]
import scala.collection.immutable.Map
val intersectionVertices = waySegmentDS
.map(way =>
//nodeId latitude longitude wayId inBuff outBuff
(way._2._1, (way._2._2, way._2._3, Map(way._1 -> (way._2._4, way._2._5)))))
.rdd
// latitude, long, Map(wayId, inBuff, outBuff)
.reduceByKey((a,b) => (a._1, a._2, a._3 ++ b._3))
//intersectionVertices = RDD[(nodeId, (latitude, longitude, wayMap(wayId -> inBuff, outBuff)))]
import scala.collection.immutable.Map
intersectionVertices: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = ShuffledRDD[122] at reduceByKey at command-1211269020742696:9
intersectionVertices.count()
res17: Long = 191991
val edges = segmentedWays
.filter(way => way._2.length > 1) //ways with more than one nodes
.flatMap{ case (wayId, nodes_info) => {
nodes_info.sliding(2)
.flatMap(segment => //segment is the pair of two nodes
List(Edge(segment(0)._1, segment(1)._1, wayId))
)
}}
edges: org.apache.spark.sql.Dataset[org.apache.spark.graphx.Edge[Long]] = [srcId: bigint, dstId: bigint ... 1 more field]
edges.count()
res19: Long = 237069
sc.setCheckpointDir("/_checkpoint") // just a directory in distributed file system
val edges_rdd = edges.rdd
intersectionVertices.checkpoint()
edges_rdd.checkpoint()
edges_rdd: org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[Long]] = MapPartitionsRDD[214] at rdd at command-1211269020742708:2
val roadGraph = Graph(intersectionVertices, edges_rdd).cache
roadGraph: org.apache.spark.graphx.Graph[(Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]),Long] = org.apache.spark.graphx.impl.GraphImpl@69447e5c
import com.esri.core.geometry.GeometryEngine.geodesicDistanceOnWGS84
import com.esri.core.geometry.Point
import com.esri.core.geometry.GeometryEngine.geodesicDistanceOnWGS84
import com.esri.core.geometry.Point
val weightedRoadGraph = roadGraph.mapTriplets{triplet =>
def dist(lat1: Double, long1: Double, lat2: Double, long2: Double): Double = {
val p1 = new Point(long1, lat1)
val p2 = new Point(long2, lat2)
geodesicDistanceOnWGS84(p1, p2)
}
val wayNodesInBuff = triplet.dstAttr._3(triplet.attr)._1 //dstAttr is the vertex attribute (latitude, longitude, wayMap(wayId -> inBuff, outBuff))
if (wayNodesInBuff.isEmpty) {
(triplet.attr, dist(triplet.srcAttr._1, triplet.srcAttr._2, triplet.dstAttr._1, triplet.dstAttr._2))
} else {
var distance: Double = 0.0
distance += dist(triplet.srcAttr._1, triplet.srcAttr._2, wayNodesInBuff(0)._2, wayNodesInBuff(0)._3 )
if (wayNodesInBuff.length > 1) {
//accumulate the intermediate distances
distance += wayNodesInBuff.sliding(2).map{
buff => dist(buff(0)._2, buff(0)._3, buff(1)._2, buff(1)._3)}
.reduce(_ + _)
}
distance += dist(wayNodesInBuff.last._2, wayNodesInBuff.last._3, triplet.dstAttr._1, triplet.dstAttr._2)
(triplet.attr, distance)
}
}.cache
weightedRoadGraph: org.apache.spark.graphx.Graph[(Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]),(Long, Double)] = org.apache.spark.graphx.impl.GraphImpl@1645750f
weightedRoadGraph.edges.count() //number of edges
res21: Long = 237069
weightedRoadGraph.edges.filter(edge => (edge.attr._2 > 100.0)).count() //number of suffering edges with a distance tolerance of 100 meters
res22: Long = 137207
weightedRoadGraph.vertices.count() //number of vertices
res23: Long = 191991
Step 4 - Construction of Coarsened Road Graph
- The distance tolerance here is set to 100 meters.
import org.apache.spark.graphx.{Edge => Edges}
val splittedEdges = weightedRoadGraph.triplets.flatMap{triplet => {
def dist(lat1: Double, long1: Double, lat2: Double, long2: Double): Double = {
val p1 = new Point(long1, lat1)
val p2 = new Point(long2, lat2)
geodesicDistanceOnWGS84(p1, p2)
}
val maxDist = 100
var finalResult = Array[(Edges[(Long, Double)], (Long, (Double, Double, Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])])), (Long, (Double, Double, Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])])))]()
if(triplet.attr._2 > maxDist){
val wayId = triplet.attr._1
var wayNodesBuff = triplet.dstAttr._3(wayId)._1
var wayNodesBuffSize = wayNodesBuff.length
if(wayNodesBuffSize > 0){
var previousSrc = triplet.srcId
var distance: Double = 0.0
var currentBuff = Array[(Long, Double, Double)]()
distance += dist(triplet.srcAttr._1, triplet.srcAttr._2, wayNodesBuff(0)._2, wayNodesBuff(0)._3)
var newVertex = (triplet.srcId, triplet.srcAttr)
var previousVertex = newVertex
if (distance > maxDist){
newVertex = (wayNodesBuff(0)._1, (wayNodesBuff(0)._2, wayNodesBuff(0)._3, Map(wayId -> (Array[(Long, Double, Double)](), Array[(Long, Double, Double)]()))))
finalResult +:= (Edges(previousSrc, wayNodesBuff(0)._1, (wayId, distance)), previousVertex, newVertex)
previousVertex = newVertex
distance = 0
previousSrc = wayNodesBuff(0)._1
}
else
{
currentBuff +:= wayNodesBuff(0)
}
//loop through pairs of nodes in the way (in the buffer)
if (wayNodesBuff.length > 1){
wayNodesBuff.sliding(2).foreach{segment => {
val tmp_dst = distance
distance += dist(segment(0)._2, segment(0)._3, segment(1)._2, segment(1)._3)
if (distance > maxDist)
{
if(segment(0)._1 != previousSrc){
// Vertex(nodeId, (lat, long, Map(wayId->inBuff, outBuff)))
newVertex = (segment(0)._1, (segment(0)._2, segment(0)._3, Map(wayId -> (currentBuff, Array[(Long, Double, Double)]()))) )
//adds the edge to the array
finalResult +:= (Edges(previousSrc, segment(0)._1, (wayId, tmp_dst)), previousVertex, newVertex)
previousVertex = newVertex
distance -= tmp_dst
previousSrc = segment(0)._1
currentBuff = Array[(Long, Double, Double)]()
}
}
else
{
currentBuff +:= segment(0)
}
}}}
//from last node in the inBuff to the dst
val tmp_dist = distance
distance += dist(wayNodesBuff.last._2, wayNodesBuff.last._3, triplet.dstAttr._1, triplet.dstAttr._2)
if (distance > maxDist){
if (wayNodesBuff.last._1 != previousSrc){
newVertex = (wayNodesBuff.last._1, (wayNodesBuff.last._2, wayNodesBuff.last._3, Map(wayId -> (currentBuff, Array[(Long, Double, Double)]()))))
finalResult +:= (Edges(previousSrc, wayNodesBuff.last._1, (wayId, tmp_dist)), previousVertex, newVertex)
previousVertex = newVertex
distance -= tmp_dist
previousSrc = wayNodesBuff.last._1
currentBuff = Array[(Long, Double, Double)]()
newVertex = (triplet.dstId, (triplet.dstAttr._1, triplet.dstAttr._2, Map(wayId -> (currentBuff, triplet.dstAttr._3(wayId)._2))) )
}
}
finalResult +:= (Edges(previousSrc, triplet.dstId, (wayId, distance)), previousVertex, newVertex)
}
// Distance > threshold but no nodes in the way (buffer)
else
{
finalResult +:= (Edges(triplet.srcId, triplet.dstId, triplet.attr), (triplet.srcId, triplet.srcAttr), (triplet.dstId, triplet.dstAttr))
}
}
// Distance < threshold
else
{
finalResult +:= (Edges(triplet.srcId, triplet.dstId, triplet.attr), (triplet.srcId, triplet.srcAttr), (triplet.dstId, triplet.dstAttr))
}
// return
finalResult
}}
import org.apache.spark.graphx.{Edge=>Edges}
splittedEdges: org.apache.spark.rdd.RDD[(org.apache.spark.graphx.Edge[(Long, Double)], (Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])])), (Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])])))] = MapPartitionsRDD[245] at flatMap at command-1211269020742721:2
splittedEdges.count()
res28: Long = 734682
// Taking each edge and its reverse
val segmentedEdges = splittedEdges.flatMap{case(edge, srcVertex, dstVertex) => Array(edge)}
segmentedEdges.count()
segmentedEdges: org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[(Long, Double)]] = MapPartitionsRDD[246] at flatMap at command-1211269020742724:2
res29: Long = 734682
// Taking the individual vertices
val segmentedVertices = splittedEdges.flatMap{case(edge, srcVertex, dstVertex) => Array(srcVertex) ++ Array(dstVertex)}
segmentedVertices.map(node => node._1).distinct().count()
segmentedVertices: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = MapPartitionsRDD[240] at flatMap at command-1211269020742727:2
res27: Long = 685121
// Converting the vertices to a df
val verticesDF = segmentedVertices.toDF("nodeId","attr").select($"nodeId",$"attr._1".as("lat"),$"attr._2".as("long"),explode($"attr._3"))
.withColumnRenamed("key","wayId").withColumnRenamed("value","buffers")
.select($"nodeId",$"lat",$"long",$"wayId",$"buffers._1".as("inBuff"),$"buffers._2".as("outBuff"))
verticesDF.show(1,false)
+----------+---------+------------------+---------+------+-------+
|nodeId |lat |long |wayId |inBuff|outBuff|
+----------+---------+------------------+---------+------+-------+
|5109322585|54.647108|25.128094200000003|137882502|[] |[] |
+----------+---------+------------------+---------+------+-------+
only showing top 1 row
verticesDF: org.apache.spark.sql.DataFrame = [nodeId: bigint, lat: double ... 4 more fields]
//unique wayIds of the edges
val nodesWayId = splittedEdges.map{case(edge, srcVertex, dstVertex) => edge.attr._1}.toDF("nodesWayId").dropDuplicates()
nodesWayId: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [nodesWayId: bigint]
// Only vertices which have a wayId in their Map that is not included in any edge
// Dead end means there are no other intersection vertex in the way
val verticesWithDeadEndWays = verticesDF.join(nodesWayId, $"nodesWayId" === $"wayId", "leftanti")
verticesWithDeadEndWays: org.apache.spark.sql.DataFrame = [nodeId: bigint, lat: double ... 4 more fields]
//convert df to rdd to be joined later with the rest of the vertices
import scala.collection.mutable.WrappedArray
val verticesWithDeadEndWaysRDD = verticesWithDeadEndWays.rdd.map(row => (row.getLong(0),(row.getDouble(1),row.getDouble(2),Map(row.getLong(3)-> (row.getAs[WrappedArray[(Long, Double, Double)]](4).array,row.getAs[WrappedArray[(Long, Double, Double)]](5).array)))))
import scala.collection.mutable.WrappedArray
verticesWithDeadEndWaysRDD: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = MapPartitionsRDD[264] at map at command-1211269020742731:3
// for a node appearing in different ways, returns one vertex for each way
val verticesWithSharedWays = splittedEdges.flatMap{case(edge, srcVertex, dstVertex) =>
{
val srcVertex1 = (srcVertex._1,(srcVertex._2._1,srcVertex._2._2,Map(edge.attr._1 -> srcVertex._2._3(edge.attr._1))))
val dstVertex1 = (dstVertex._1,(dstVertex._2._1,dstVertex._2._2,Map(edge.attr._1 -> dstVertex._2._3(edge.attr._1))))
Array(srcVertex1) ++ Array(dstVertex1)
}}.distinct()
verticesWithSharedWays: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = MapPartitionsRDD[268] at distinct at command-1211269020742732:8
//union of verticesWithDeadEndWaysRDD and verticesWithSharedWays and reduced adding the maps
val allVertices = verticesWithSharedWays.union(verticesWithDeadEndWaysRDD).reduceByKey((a,b) => (a._1, a._2, a._3 ++ b._3))
allVertices.count()
allVertices: org.apache.spark.rdd.RDD[(Long, (Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]))] = ShuffledRDD[270] at reduceByKey at command-1211269020742733:2
res34: Long = 685121
dbutils.fs.mkdirs("/_checkpoint1")
res36: Boolean = true
sc.setCheckpointDir("/_checkpoint1") // just a directory in distributed file system
allVertices.checkpoint()
segmentedEdges.checkpoint()
val coarsened_graph_100 = Graph(allVertices, segmentedEdges)
coarsened_graph_100: org.apache.spark.graphx.Graph[(Double, Double, scala.collection.immutable.Map[Long,(Array[(Long, Double, Double)], Array[(Long, Double, Double)])]),(Long, Double)] = org.apache.spark.graphx.impl.GraphImpl@128b8420
PageRank algorithm in the graph
Stavroula Rafailia Vlachou (LinkedIn), Virginia Jimenez Mohedano (LinkedIn) and Raazesh Sainudiin (LinkedIn).
This project was supported by SENSMETRY through a Data Science Project Internship
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou and Virginia J. Mohedano
and Databricks University Alliance with infrastructure credits from AWS to
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.
2022, Uppsala, Sweden
import crosby.binary.osmosis.OsmosisReader
import org.apache.hadoop.mapreduce.{TaskAttemptContext, JobContext}
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.openstreetmap.osmosis.core.container.v0_6.EntityContainer
import org.openstreetmap.osmosis.core.domain.v0_6._
import org.openstreetmap.osmosis.core.task.v0_6.Sink
import sqlContext.implicits._
import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.Map
import scala.collection.JavaConversions._
import org.apache.spark.sql.functions._
import org.apache.spark.graphx._
import crosby.binary.osmosis.OsmosisReader
import org.apache.hadoop.mapreduce.{TaskAttemptContext, JobContext}
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.openstreetmap.osmosis.core.container.v0_6.EntityContainer
import org.openstreetmap.osmosis.core.domain.v0_6._
import org.openstreetmap.osmosis.core.task.v0_6.Sink
import sqlContext.implicits._
import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.Map
import scala.collection.JavaConversions._
import org.apache.spark.sql.functions._
import org.apache.spark.graphx._
spark.conf.set("spark.sql.parquet.binaryAsString", true)
val nodes_df = spark.read.parquet("dbfs:/datasets/osm/lithuania/lithuania.osm.pbf.node.parquet")
case class NodeEntry(nodeId: Long, latitude: Double, longitude: Double, tags: Seq[String])
val nodeDS = nodes_df.map(node =>
NodeEntry(node.getAs[Long]("id"),
node.getAs[Double]("latitude"),
node.getAs[Double]("longitude"),
node.getAs[Seq[Row]]("tags").map{case Row(key:String, value:String) => value}
)).cache()
nodes_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 7 more fields]
defined class NodeEntry
nodeDS: org.apache.spark.sql.Dataset[NodeEntry] = [nodeId: bigint, latitude: double ... 2 more fields]
nodeDS.show(10)
+--------+------------------+------------------+-----------------+
| nodeId| latitude| longitude| tags|
+--------+------------------+------------------+-----------------+
|15389886| 54.7309125|25.239701200000003|[traffic_signals]|
|15389895|54.732171400000006|25.243689500000002| []|
|15389899| 54.7352788| 25.2467356| []|
|15389959| 54.7355529| 25.2458712| []|
|15389961|54.735927100000005|25.245138800000003| []|
|15389967|54.741563400000004|25.238850600000003| []|
|15390015|54.735093600000006| 25.2478942| []|
|15390016|54.734942700000005| 25.2500417| []|
|15390017|54.734759200000006|25.251196200000003| []|
|15390018| 54.7344154| 25.2522184| []|
+--------+------------------+------------------+-----------------+
only showing top 10 rows
val edges_0 = spark.read.parquet("/_checkpoint/edges_LT_initial")
val vertices_0 = spark.read.parquet("/_checkpoint/vertices_LT_initial")
edges_0: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 1 more field]
vertices_0: org.apache.spark.sql.DataFrame = [id: bigint, Map: struct<_1: double, _2: double ... 1 more field>]
import org.apache.spark.graphx.Graph
import org.graphframes.GraphFrame
val r = GraphFrame(vertices_0, edges_0)
import org.apache.spark.graphx.Graph
import org.graphframes.GraphFrame
r: org.graphframes.GraphFrame = GraphFrame(v:[id: bigint, Map: struct<_1: double, _2: double ... 1 more field>], e:[src: bigint, dst: bigint ... 1 more field])
import org.apache.spark.graphx.lib.PageRank
val segmentedGraph = r.toGraphX
// Run PageRank for a fixed number of iterations.
val new_ranks = PageRank.runUntilConvergence(segmentedGraph,tol=0.01,resetProb=0.15).cache()
import org.apache.spark.graphx.lib.PageRank
segmentedGraph: org.apache.spark.graphx.Graph[org.apache.spark.sql.Row,org.apache.spark.sql.Row] = org.apache.spark.graphx.impl.GraphImpl@6c4ec6ca
new_ranks: org.apache.spark.graphx.Graph[Double,Double] = org.apache.spark.graphx.impl.GraphImpl@74c3bb3
segmentedGraph.degrees.sortBy(-_._2).take(10)
res3: Array[(org.apache.spark.graphx.VertexId, Int)] = Array((509277216,10), (429416369,10), (495049689,10), (450263482,10), (417013574,10), (2043449705,9), (1495498282,9), (429511082,9), (2967263581,9), (91091176,9))
val top_ranks = new_ranks.vertices
top_ranks: org.apache.spark.graphx.VertexRDD[Double] = VertexRDDImpl[1581] at RDD at VertexRDD.scala:57
top_ranks.take(1)
res4: Array[(org.apache.spark.graphx.VertexId, Double)] = Array((1935599424,1.28014875095845))
val ranksDS = top_ranks.toDF("id", "PageRank")
ranksDS: org.apache.spark.sql.DataFrame = [id: bigint, PageRank: double]
import org.apache.spark.sql.functions._
val ranks_located = ranksDS.join(nodeDS, ranksDS("id") === nodeDS("nodeId"), "left_outer").orderBy(col("PageRank").desc)
import org.apache.spark.sql.functions._
ranks_located: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [id: bigint, PageRank: double ... 4 more fields]
ranks_located.show(10)
+----------+------------------+----------+------------------+------------------+----+
| id| PageRank| nodeId| latitude| longitude|tags|
+----------+------------------+----------+------------------+------------------+----+
|2370300576| 8.108818241224267|2370300576|54.873593400000004| 24.0557738| []|
|9053620686| 7.344862855346021|9053620686| 55.2685698|22.526177800000003| []|
|9455107664| 7.259922232547948|9455107664|54.889957800000005| 23.8424611| []|
|1804105454| 6.454593752088028|1804105454| 55.26873500000001| 22.5265232| []|
|3722657621|6.4231667975781095|3722657621| 56.3114233| 22.2750071| []|
| 460043992| 6.218147627390615| 460043992| 55.92036|23.292098000000003| []|
| 834596837| 6.012281739910181| 834596837| 54.6699622| 25.3708519| []|
| 293618407| 5.907439750914314| 293618407|55.718506500000004|21.479742700000003| []|
|3722657743| 5.877135433049267|3722657743| 56.3125726| 22.2706684| []|
| 552930949| 5.856110729504616| 552930949|54.862435700000006| 24.4702166| []|
+----------+------------------+----------+------------------+------------------+----+
only showing top 10 rows
ranks_located.where(col("id") === "509277216").show()
+---------+------------------+---------+----------+------------------+-----------------+
| id| PageRank| nodeId| latitude| longitude| tags|
+---------+------------------+---------+----------+------------------+-----------------+
|509277216|2.6820163978577063|509277216|55.9684321|25.585430900000002|[traffic_signals]|
+---------+------------------+---------+----------+------------------+-----------------+
val degrees = segmentedGraph.degrees.sortBy(-_._2).toDF("id","degree")
degrees: org.apache.spark.sql.DataFrame = [id: bigint, degree: int]
ranks_located.join(degrees, ranks_located("id") === degrees("id")).show(10)
segmentedGraph.vertices.count
res20: Long = 191991
segmentedGraph.edges.count
res14: Long = 237069
Map-matching OpenStreetMap Nodes to OpenStreetMap Ways
Stavroula Rafailia Vlachou (LinkedIn) and Raazesh Sainudiin (LinkedIn).
This project was supported by SENSMETRY through a Data Science Project Internship
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou
Databricks University Alliance with infrastructure credits from AWS to
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.
2022, Uppsala, Sweden
What is map-matching?
Map matching is the problem of how to match recorded geographic coordinates to a logical model of the real world, typically using some form of Geographic Information System.
Map-Matching with GeoMatch
GeoMatch is a novel, scalable, and efficient big-data pipeline for large-scale map-matching on Apache Spark. It improves existing spatial big data solutions by utilizing a novel spatial partitioning scheme inspired by Hilbert space-filling curves.
The library can be found in the following git repository GeoMatch.
The necessary files to generate the jar for this work can be found in the following fork https://github.com/StavroulaVlachou/GeoMatch.
Read GeoMatch: Efficient Large-Scale Map Matching on Apache Spark
Instructions
git clone git@github.com:StavroulaVlachou/GeoMatch.git
cd Common
mvn compile install
cd ../GeoMatch
mvn compile install
The generated jar files can be found within the target directories. Then, 1. In Databricks choose Create -> Library and upload the packaged jars. 2. Create a Spark 2.4.0 - Scala 2.11 cluster with the uploaded GeoMatch library installed or if you are alreadt running a cluster and installed the uploaded library to it you have to detach and re-attache any notebook currently using that cluster.
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.serializer.KryoSerializer
import org.cusp.bdi.gm.GeoMatch
import org.cusp.bdi.gm.geom.GMPoint
import org.cusp.bdi.gm.geom.GMLineString
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.serializer.KryoSerializer
import org.cusp.bdi.gm.GeoMatch
import org.cusp.bdi.gm.geom.GMPoint
import org.cusp.bdi.gm.geom.GMLineString
import crosby.binary.osmosis.OsmosisReader
import org.apache.hadoop.mapreduce.{TaskAttemptContext, JobContext}
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.openstreetmap.osmosis.core.container.v0_6.EntityContainer
import org.openstreetmap.osmosis.core.domain.v0_6._
import org.openstreetmap.osmosis.core.task.v0_6.Sink
import sqlContext.implicits._
import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.Map
import scala.collection.JavaConversions._
import org.apache.spark.graphx._
import magellan.Point
import crosby.binary.osmosis.OsmosisReader
import org.apache.hadoop.mapreduce.{TaskAttemptContext, JobContext}
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.openstreetmap.osmosis.core.container.v0_6.EntityContainer
import org.openstreetmap.osmosis.core.domain.v0_6._
import org.openstreetmap.osmosis.core.task.v0_6.Sink
import sqlContext.implicits._
import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.Map
import scala.collection.JavaConversions._
import org.apache.spark.graphx._
import magellan.Point
ls /datasets/osm/uppsala
| path | name | size |
|---|---|---|
| dbfs:/datasets/osm/uppsala/.uppsalaTinyR.pbf.node.parquet.crc | .uppsalaTinyR.pbf.node.parquet.crc | 172.0 |
| dbfs:/datasets/osm/uppsala/.uppsalaTinyR.pbf.relation.parquet.crc | .uppsalaTinyR.pbf.relation.parquet.crc | 84.0 |
| dbfs:/datasets/osm/uppsala/.uppsalaTinyR.pbf.way.parquet.crc | .uppsalaTinyR.pbf.way.parquet.crc | 84.0 |
| dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf | uppsalaTinyR.pbf | 17867.0 |
| dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf.node.parquet | uppsalaTinyR.pbf.node.parquet | 20829.0 |
| dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf.relation.parquet | uppsalaTinyR.pbf.relation.parquet | 9394.0 |
| dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf.way.parquet | uppsalaTinyR.pbf.way.parquet | 9542.0 |
| dbfs:/datasets/osm/uppsala/uppsalaTinyV.osm.pbf | uppsalaTinyV.osm.pbf | 30606.0 |
- Run the following command only once per cluster
java -jar /dbfs/FileStore/jars/2706d711_3963_4d88_92e7_a8870d0164d1-osm_parquetizer_1_0_1_SNAPSHOT-80d25.jar /dbfs/datasets/osm/uppsala/uppsalaTinyR.pbf
2022-04-08 09:42:42 INFO CodecPool:153 - Got brand-new compressor [.snappy]
2022-04-08 09:42:47 INFO CodecPool:153 - Got brand-new compressor [.snappy]
2022-04-08 09:42:47 INFO CodecPool:153 - Got brand-new compressor [.snappy]
2022-04-08 09:42:53 INFO App$MultiEntitySinkObserver:125 - Total entities processed: 896
ls /dbfs/datasets/osm/uppsala/
uppsalaTinyR.pbf
uppsalaTinyR.pbf.node.parquet
uppsalaTinyR.pbf.relation.parquet
uppsalaTinyR.pbf.way.parquet
uppsalaTinyV.osm.pbf
spark.conf.set("spark.sql.parquet.binaryAsString", true)
val nodes_df = spark.read.parquet("dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf.node.parquet")
val ways_df = spark.read.parquet("dbfs:/datasets/osm/uppsala/uppsalaTinyR.pbf.way.parquet")
nodes_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 7 more fields]
ways_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 6 more fields]
val allowableWays = Seq(
"motorway",
"motorway_link",
"trunk",
"trunk_link",
"primary",
"primary_link",
"secondary",
"secondary_link",
"tertiary",
"tertiary_link",
"living_street",
"residential",
"road",
"construction",
"motorway_junction"
)
allowableWays: Seq[String] = List(motorway, motorway_link, trunk, trunk_link, primary, primary_link, secondary, secondary_link, tertiary, tertiary_link, living_street, residential, road, construction, motorway_junction)
//convert the nodes to Dataset containing the fields of interest
case class NodeEntry(nodeId: Long, latitude: Double, longitude: Double, tags: Seq[String])
val nodeDS = nodes_df.map(node =>
NodeEntry(node.getAs[Long]("id"),
node.getAs[Double]("latitude"),
node.getAs[Double]("longitude"),
node.getAs[Seq[Row]]("tags").map{case Row(key:String, value:String) => value}
)).cache()
defined class NodeEntry
nodeDS: org.apache.spark.sql.Dataset[NodeEntry] = [nodeId: bigint, latitude: double ... 2 more fields]
//convert the ways to Dataset containing the fields of interest
case class WayEntry(wayId: Long, tags: Array[String], nodes: Array[Long])
val wayDS = ways_df.flatMap(way => {
val tagSet = way.getAs[Seq[Row]]("tags").map{case Row(key:String, value:String) => value}.toArray
if (tagSet.intersect(allowableWays).nonEmpty ){
Array(WayEntry(way.getAs[Long]("id"),
tagSet,
way.getAs[Seq[Row]]("nodes").map{case Row(index:Integer, nodeId:Long) => nodeId}.toArray
))
}
else { Array[WayEntry]()}
}
).cache()
defined class WayEntry
wayDS: org.apache.spark.sql.Dataset[WayEntry] = [wayId: bigint, tags: array<string> ... 1 more field]
val distinctNodesWays = wayDS.flatMap(_.nodes).distinct //the distinct nodes within the ways
distinctNodesWays: org.apache.spark.sql.Dataset[Long] = [value: bigint]
val wayNodes = nodeDS.as("nodes") //nodes that are in a way + nodes info from nodeDS
.joinWith(distinctNodesWays.as("ways"), $"ways.value" === $"nodes.nodeId")
.map(_._1).cache
wayNodes: org.apache.spark.sql.Dataset[NodeEntry] = [nodeId: bigint, latitude: double ... 2 more fields]
import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.functions.concat_ws
import org.apache.spark.sql.functions._
val nodes = wayDS.
select($"wayId", $"nodes").
withColumn("node", explode($"nodes")).
drop("nodes")
val wayNodesLocated = nodes.join(wayNodes, wayNodes.col("nodeId") === nodes.col("node")).select($"wayId", $"node", $"latitude", $"longitude").groupBy("wayId").agg(collect_list(concat($"latitude",lit(" "), $"longitude")).alias("list_of_coordinates")).withColumn("coordinates_str", concat_ws("," ,col("list_of_coordinates"))).drop("list_of_coordinates")
wayNodesLocated.show(1, false)
+---------+----------------------------------------------------------+
|wayId |coordinates_str |
+---------+----------------------------------------------------------+
|393182257|59.8569759 17.644382,59.857381800000006 17.645299100000003|
+---------+----------------------------------------------------------+
only showing top 1 row
import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.functions.concat_ws
import org.apache.spark.sql.functions._
nodes: org.apache.spark.sql.DataFrame = [wayId: bigint, node: bigint]
wayNodesLocated: org.apache.spark.sql.DataFrame = [wayId: bigint, coordinates_str: string]
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
if(!ArcGISRuntimeEnvironment.isInitialized())
{
ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
ArcGISRuntimeEnvironment.initialize()
}
Initializing...
Java version : 1.8.0_282 (Azul Systems, Inc.) amd64
def project_to_meters(lon: String, lat: String): String = {
if(!ArcGISRuntimeEnvironment.isInitialized())
{
ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
ArcGISRuntimeEnvironment.initialize()
}
val initial_point = new Point(lon.toDouble, lat.toDouble, SpatialReference.create(4326)) //WGS84
val reprojection = GeometryEngine.project(initial_point, SpatialReference.create(3035)) //European Grid
reprojection.toString
}
spark.udf.register("project_to_meters", project_to_meters(_:String, _:String):String)
project_to_meters: (lon: String, lat: String)String
res9: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,StringType,Some(List(StringType, StringType)))
val ways_reprojected = wayNodesLocated.rdd.map(line => line.toString.replaceAll("\\[","").replaceAll("\\]","")).map(line => {val parts = line.replaceAll("\"","").split(",");val arrCoords = parts.slice(1,parts.length).map(xyStr => {val xy = xyStr.split(' ');val reprojection = project_to_meters(xy(1).toString, xy(0).toString);val coords = reprojection.replaceAll(",","").replaceAll("\\[","").split(" ").slice(1,reprojection.length);val xy_new = coords(0).toString +" "+ coords(1).toString;xy_new});("LineString"+" " +parts(0).toString, arrCoords)})
val waysDF = ways_reprojected.toDF("LineStringId","coords")
val ways_unpacked = waysDF.select(col("LineStringId"),concat_ws(",",col("coords"))).rdd.map(line => line.toString.replaceAll("\\[","").replaceAll("\\]",""))
ways_unpacked.take(1)
ways_reprojected: org.apache.spark.rdd.RDD[(String, Array[String])] = MapPartitionsRDD[55] at map at command-4438247265478911:1
waysDF: org.apache.spark.sql.DataFrame = [LineStringId: string, coords: array<string>]
ways_unpacked: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[61] at map at command-4438247265478911:3
res10: Array[String] = Array(LineString 393182257,4749494.332253 4107152.617124,4749540.389628 4107203.021679)
val rddFirst = ways_unpacked.mapPartitions(_.map(line =>{val parts = line.replaceAll("\"","").split(',');val arrCoords = parts.slice(1, parts.length).map(xyStr => {val xy = xyStr.split(' ');(xy(0).toDouble.toInt, xy(1).toDouble.toInt)});new GMLineString(parts(0), arrCoords)}))
rddFirst: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMLineString] = MapPartitionsRDD[62] at mapPartitions at command-4438247265478912:1
rddFirst.take(1)
res12: Array[org.cusp.bdi.gm.geom.GMLineString] = Array(GMLineString(LineString 393182257,[Lscala.Tuple2;@16eea034))
val rddFirstSet = sc.textFile("FileStore/tables/UUways.csv").mapPartitions(_.map(line =>{val parts = line.replaceAll("\"","").split(',');val arrCoords = parts.slice(1, parts.length).map(xyStr => {val xy = xyStr.split(' ');(xy(0).toDouble.toInt, xy(1).toDouble.toInt)});new GMLineString(parts(0), arrCoords)}))
rddFirstSet: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMLineString] = MapPartitionsRDD[65] at mapPartitions at command-374221935645076:1
rddFirstSet.take(1)
res13: Array[org.cusp.bdi.gm.geom.GMLineString] = Array(GMLineString(LineString 393182257,[Lscala.Tuple2;@2aa93df3))
rddFirstSet.count() //9 ways
res14: Long = 9
val rddSecondSet = sc.textFile("FileStore/tables/UUnodes.csv").mapPartitions(_.map(line => {val parts = line.replaceAll("\"","").split(',');new GMPoint(parts(0), (parts(1).toDouble.toInt, parts(2).toDouble.toInt))}))
rddSecondSet: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMPoint] = MapPartitionsRDD[68] at mapPartitions at command-432075383419156:1
rddSecondSet.take(1)
res15: Array[org.cusp.bdi.gm.geom.GMPoint] = Array(GMPoint(Point 312352,(4749694,4107105)))
rddSecondSet.count() //626 nodes to be map-matched
res16: Long = 626
val geoMatch = new GeoMatch(false, 16, 150, (-1, -1, -1, -1)) //n(=dimension of the Hilber curve) should be a power of 2.
geoMatch: org.cusp.bdi.gm.GeoMatch = GeoMatch(false,16,150.0,(-1,-1,-1,-1))
val resultRDD = geoMatch.spatialJoinKNN(rddFirst, rddSecondSet, 1, false)
resultRDD: org.apache.spark.rdd.RDD[(org.cusp.bdi.gm.geom.GMPoint, scala.collection.mutable.ListBuffer[org.cusp.bdi.gm.geom.GMLineString])] = MapPartitionsRDD[83] at mapPartitions at GeoMatch.scala:94
resultRDD.filter(element => (element._2.isEmpty)).count() //number of nodes that are not matched successfully
res19: Long = 44
resultRDD.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => !(element._2.isEmpty)).toDF("pointId", "matchId").show(5, false)
+---------------+----------------------+
|pointId |matchId |
+---------------+----------------------+
|Point 312363 |[LineString 263934971]|
|Point 25724030 |[LineString 263934971]|
|Point 25735257 |[LineString 263934973]|
|Point 25812013 |[LineString 263934971]|
|Point 390925129|[LineString 263934971]|
+---------------+----------------------+
only showing top 5 rows
Map-matching OpenStreetMap Nodes to Road Graph elements
Stavroula Rafailia Vlachou (LinkedIn) and Raazesh Sainudiin (LinkedIn).
This project was supported by SENSMETRY through a Data Science Project Internship
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou
Databricks University Alliance with infrastructure credits from AWS to
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.
2022, Uppsala, Sweden
import org.apache.spark.graphx._
import sqlContext.implicits._
import scala.collection.JavaConversions._
import org.apache.spark.sql.functions.{concat, lit}
import org.cusp.bdi.gm.GeoMatch
import org.cusp.bdi.gm.geom.GMPoint
import org.cusp.bdi.gm.geom.GMLineString
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
import org.apache.spark.graphx._
import sqlContext.implicits._
import scala.collection.JavaConversions._
import org.apache.spark.sql.functions.{concat, lit}
import org.cusp.bdi.gm.GeoMatch
import org.cusp.bdi.gm.geom.GMPoint
import org.cusp.bdi.gm.geom.GMLineString
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
val edges = spark.read.parquet("dbfs:/graphs/uppsala/edges")
val vertices = spark.read.parquet("dbfs:/graphs/uppsala/vertices").toDF("vertexId", "latitude", "longitude")
edges: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint]
vertices: org.apache.spark.sql.DataFrame = [vertexId: bigint, latitude: double ... 1 more field]
val src_coordinates = edges.join(vertices,vertices("vertexId") === edges("src"), "left_outer").drop("vertexId").withColumnRenamed("latitude", "src_latitude").withColumnRenamed("longitude","src_longitude")
val edge_coordinates = src_coordinates.join(vertices,vertices("vertexId") === src_coordinates("dst")).drop("vertexId").withColumnRenamed("latitude", "dst_latitude").withColumnRenamed("longitude", "dst_longitude")
src_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 2 more fields]
edge_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 4 more fields]
val concat_coordinates = edge_coordinates.select($"src",concat($"src_latitude",lit(" "),$"src_longitude").alias("src_coordinates"), $"dst",concat($"dst_latitude",lit(" "),$"dst_longitude").alias("dst_coordinates"))
concat_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, src_coordinates: string ... 2 more fields]
val linestring_coordinates = concat_coordinates.select($"src", $"dst",concat($"src_coordinates", lit(","), $"dst_coordinates").alias("list_of_coordinates"))
linestring_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 1 more field]
val first = linestring_coordinates.select(concat(lit("LineString:"),$"src",lit("+"), $"dst").alias("LineString"),$"list_of_coordinates")
first: org.apache.spark.sql.DataFrame = [LineString: string, list_of_coordinates: string]
val first_rdd = first.rdd
first_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[124] at rdd at command-4069571511113730:1
if(!ArcGISRuntimeEnvironment.isInitialized())
{
ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
ArcGISRuntimeEnvironment.initialize()
}
def project_to_meters(lon: String, lat: String): String = {
if(!ArcGISRuntimeEnvironment.isInitialized())
{
ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
ArcGISRuntimeEnvironment.initialize()
}
val initial_point = new Point(lon.toDouble, lat.toDouble, SpatialReference.create(4326))
val reprojection = GeometryEngine.project(initial_point, SpatialReference.create(3035))
reprojection.toString
}
spark.udf.register("project_to_meters", project_to_meters(_:String, _:String):String)
project_to_meters: (lon: String, lat: String)String
res9: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,StringType,Some(List(StringType, StringType)))
val ways_reprojected = first_rdd.map(line => line.toString.replaceAll("\\[","").replaceAll("\\]","")).map(line => {val parts = line.replaceAll("\"","").split(",");val arrCoords = parts.slice(1,parts.length).map(xyStr => {val xy = xyStr.split(' ');val reprojection = project_to_meters(xy(1).toString, xy(0).toString);val coords = reprojection.replaceAll(",","").replaceAll("\\[","").split(" ").slice(1,reprojection.length);val xy_new = coords(0).toString +" "+ coords(1).toString;xy_new});(parts(0).toString, arrCoords)})
ways_reprojected: org.apache.spark.rdd.RDD[(String, Array[String])] = MapPartitionsRDD[126] at map at command-4069571511113735:1
val ways_unpacked = ways_reprojected.map(item => item._1.toString + "," + item._2(0).toString + "," + item._2(1).toString)
ways_unpacked: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[127] at map at command-4069571511113736:1
val rdd_first_set = ways_unpacked.mapPartitions(_.map(line =>{val parts = line.replaceAll("\"","").split(',');val arrCoords = parts.slice(1, parts.length).map(xyStr => {val xy = xyStr.split(' ');(xy(0).toDouble.toInt, xy(1).toDouble.toInt)});new GMLineString(parts(0), arrCoords)}))
rdd_first_set: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMLineString] = MapPartitionsRDD[128] at mapPartitions at command-4069571511113737:1
rdd_first_set.take(1)
res10: Array[org.cusp.bdi.gm.geom.GMLineString] = Array(GMLineString(LineString:312363+25735257,[Lscala.Tuple2;@7336e6f4))
def unpack_lat(str: String): String = {
val lat = str.replaceAll(",","").replaceAll("\\[","").split(" ")(2)
return lat
}
spark.udf.register("unpack_lat", unpack_lat(_:String): String)
def unpack_lon(str: String): String = {
val lon = str.replaceAll(",","").replaceAll("\\[","").split(" ")(1)
return lon
}
spark.udf.register("unpack_lon", unpack_lon(_:String): String)
unpack_lat: (str: String)String
unpack_lon: (str: String)String
res11: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,StringType,Some(List(StringType)))
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
val initial_points = vertices.toDF().select(col("vertexId").cast(StringType), col("latitude").cast(StringType), col("longitude").cast(StringType)).withColumn("Point", lit("Point "))
val reprojected_points = initial_points.selectExpr("concat(Point,vertexId) as PointId","project_to_meters(longitude, latitude) as reprojection")
val unpacked_reprojection = reprojected_points.selectExpr("PointId","unpack_lat(reprojection) as new_lat", "unpack_lon(reprojection) as new_lon").rdd
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
initial_points: org.apache.spark.sql.DataFrame = [vertexId: string, latitude: string ... 2 more fields]
reprojected_points: org.apache.spark.sql.DataFrame = [PointId: string, reprojection: string]
unpacked_reprojection: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[133] at rdd at command-2803386459776172:5
unpacked_reprojection.take(1)
res13: Array[org.apache.spark.sql.Row] = Array([Point 25812013,4107235.859946,4749331.992325])
val f = unpacked_reprojection.map(line => {val id = line(0).toString; val lat = line(1).toString; val lon = line(2).toString;id+"," + lat +","+ lon})
f: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[134] at map at command-2803386459776174:1
val rddSecondSet = f.mapPartitions(_.map(line => {val parts = line.replaceAll("\"","").split(',');new GMPoint(parts(0), (parts(2).toDouble.toInt, parts(1).toDouble.toInt))}))
rddSecondSet: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMPoint] = MapPartitionsRDD[135] at mapPartitions at command-2803386459776169:1
rddSecondSet.take(1)
res14: Array[org.cusp.bdi.gm.geom.GMPoint] = Array(GMPoint(Point 25812013,(4749331,4107235)))
val geoMatch = new GeoMatch(false, 16, 150, (-1, -1, -1, -1)) //n(=dimension of the Hilber curve) should be a power of 2.
geoMatch: org.cusp.bdi.gm.GeoMatch = GeoMatch(false,16,150.0,(-1,-1,-1,-1))
val resultRDD = geoMatch.spatialJoinKNN(rdd_first_set, rddSecondSet, 1, false)
resultRDD: org.apache.spark.rdd.RDD[(org.cusp.bdi.gm.geom.GMPoint, scala.collection.mutable.ListBuffer[org.cusp.bdi.gm.geom.GMLineString])] = MapPartitionsRDD[150] at mapPartitions at GeoMatch.scala:94
resultRDD.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => !(element._2.isEmpty)).take(10)
res15: Array[(String, scala.collection.mutable.ListBuffer[String])] = Array((Point 25735257,ListBuffer(LineString:25735257+3067700641)), (Point 312363,ListBuffer(LineString:3067700668+312363)), (Point 3067700641,ListBuffer(LineString:25735257+3067700641)), (Point 3963994985,ListBuffer(LineString:3963994985+25735257)), (Point 312353,ListBuffer(LineString:312353+25734373)), (Point 3067700668,ListBuffer(LineString:3067700668+312363)), (Point 2206536278,ListBuffer(LineString:3067700641+2206536278)), (Point 25734373,ListBuffer(LineString:25734373+3431600977)))
resultRDD.toDF("k", "line").show(10, false)
+--------------------------------------+------------------------------------------------------------------------------+
|k |line |
+--------------------------------------+------------------------------------------------------------------------------+
|[Point 455006648, [4749516, 4107261]] |[] |
|[Point 25735257, [4749494, 4107152]] |[[LineString:25735257+3067700641, [[4749494, 4107152], [4749524, 4107127]]]] |
|[Point 312363, [4749423, 4107214]] |[[LineString:3067700668+312363, [[4749419, 4107218], [4749423, 4107214]]]] |
|[Point 3067700641, [4749524, 4107127]]|[[LineString:25735257+3067700641, [[4749494, 4107152], [4749524, 4107127]]]] |
|[Point 3431600977, [4749699, 4107100]]|[] |
|[Point 3963994985, [4749540, 4107203]]|[[LineString:3963994985+25735257, [[4749540, 4107203], [4749494, 4107152]]]] |
|[Point 312353, [4749573, 4107212]] |[[LineString:312353+25734373, [[4749573, 4107212], [4749648, 4107146]]]] |
|[Point 3067700668, [4749419, 4107218]]|[[LineString:3067700668+312363, [[4749419, 4107218], [4749423, 4107214]]]] |
|[Point 25812013, [4749331, 4107235]] |[] |
|[Point 2206536278, [4749587, 4107073]]|[[LineString:3067700641+2206536278, [[4749524, 4107127], [4749587, 4107073]]]]|
+--------------------------------------+------------------------------------------------------------------------------+
only showing top 10 rows
Map-Matching Events on a State Space / Road Graph with GeoMatch
Stavroula Rafailia Vlachou (LinkedIn) and Raazesh Sainudiin (LinkedIn).
This project was supported by SENSMETRY through a Data Science Project Internship
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou and
Databricks University Alliance with infrastructure credits from AWS to
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.
2022, Uppsala, Sweden
Map-Matching with GeoMatch
GeoMatch is a novel, scalable, and efficient big-data pipeline for large-scale map-matching on Apache Spark. It improves existing spatial big data solutions by utilizing a novel spatial partitioning scheme inspired by Hilbert space-filling curves.
The library can be found in the following git repository GeoMatch.
The necessary files to generate the jar for this work can be found in the following fork https://github.com/StavroulaVlachou/GeoMatch.
Instructions
git clone git@github.com:StavroulaVlachou/GeoMatch.git
cd Common
mvn compile install
cd ../GeoMatch
mvn compile install
The generated jar files can be found within the target directories. Then, 1. In Databricks choose Create -> Library and upload the packaged jars. 2. Create a Spark 2.4.0 - Scala 2.11 cluster with the uploaded GeoMatch library installed or if you are alreadt running a cluster and installed the uploaded library to it you have to detach and re-attache any notebook currently using that cluster.
//This allows easy embedding of publicly available information into any other notebook
//when viewing in git-book just ignore this block - you may have to manually chase the URL in frameIt("URL").
//Example usage:
// displayHTML(frameIt("https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation#Topics_in_LDA",250))
def frameIt( u:String, h:Int ) : String = {
"""<iframe
src=""""+ u+""""
width="95%" height="""" + h + """"
sandbox>
<p>
<a href="http://spark.apache.org/docs/latest/index.html">
Fallback link for browsers that, unlikely, don't support frames
</a>
</p>
</iframe>"""
}
displayHTML(frameIt("https://en.wikipedia.org/wiki/Map_matching",600))
import org.apache.spark.graphx._
import sqlContext.implicits._
import org.apache.spark.sql.functions._
import scala.collection.JavaConversions._
import org.cusp.bdi.gm.GeoMatch
import org.cusp.bdi.gm.geom.GMPoint
import org.cusp.bdi.gm.geom.GMLineString
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
import org.apache.spark.graphx._
import sqlContext.implicits._
import org.apache.spark.sql.functions._
import scala.collection.JavaConversions._
import org.cusp.bdi.gm.GeoMatch
import org.cusp.bdi.gm.geom.GMPoint
import org.cusp.bdi.gm.geom.GMLineString
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
State Space / Road Graph
- In this work, we wish to match points of interest - events - against states of a State Space. The State Space consists of elements of the Road Graph. Specifically, a state is either a vertex that corresponds to an intersection point or an edge which is essentially a road segment.
- First we obtain the nodes and ways of the underlying road network.
spark.conf.set("spark.sql.parquet.binaryAsString", true)
val nodes_df = spark.read.parquet("dbfs:/datasets/osm/lithuania/lithuania.osm.pbf.node.parquet")
val ways_df = spark.read.parquet("dbfs:/datasets/osm/lithuania/lithuania.osm.pbf.way.parquet")
nodes_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 7 more fields]
ways_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 6 more fields]
//convert the nodes to Dataset containing the fields of interest
case class NodeEntry(nodeId: Long, latitude: Double, longitude: Double, tags: Seq[String])
val nodeDS = nodes_df.map(node =>
NodeEntry(node.getAs[Long]("id"),
node.getAs[Double]("latitude"),
node.getAs[Double]("longitude"),
node.getAs[Seq[Row]]("tags").map{case Row(key:String, value:String) => value}
))
defined class NodeEntry
nodeDS: org.apache.spark.sql.Dataset[NodeEntry] = [nodeId: bigint, latitude: double ... 2 more fields]
- The next step is to obtain the intersection points and associate them with their corresponding vertices on the graph.
val intersections = spark.read.parquet("dbfs:/LT/intersections")
intersections: org.apache.spark.sql.DataFrame = [intersectionNode: bigint]
intersections.count //in this area there are 162325 intersection points
res3: Long = 162325
- GeoMatch deals with points whose coordinates are measured in meters. However, OSM data have their coordinates expressed in degrees (WGS84 - spatial reference index 4326). Thus, for each point that is to participate in the matching we identify it's OSM coordinates and reproject them onto the European Grid (spatial reference index 3035).
val intersection_points = nodeDS.join(intersections, intersections("intersectionNode") === nodeDS("nodeId")).drop("tags", "nodeId").select("intersectionNode", "latitude", "longitude")
intersection_points: org.apache.spark.sql.DataFrame = [intersectionNode: bigint, latitude: double ... 1 more field]
val concat_coordinates = intersection_points.select($"intersectionNode",concat($"latitude",lit(" "),$"longitude").alias("coordinates"))
concat_coordinates: org.apache.spark.sql.DataFrame = [intersectionNode: bigint, coordinates: string]
val firstIntersectionStates = concat_coordinates.select(concat(lit("LineString:"),$"intersectionNode").alias("LineString"),$"coordinates")
val firstIntersectionStates_rdd = firstIntersectionStates.rdd
firstIntersectionStates: org.apache.spark.sql.DataFrame = [LineString: string, coordinates: string]
firstIntersectionStates_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[318] at rdd at command-3336180278405410:2
if(!ArcGISRuntimeEnvironment.isInitialized())
{
ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
ArcGISRuntimeEnvironment.initialize()
}
def project_to_meters(lon: String, lat: String): String = {
if(!ArcGISRuntimeEnvironment.isInitialized())
{
ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
ArcGISRuntimeEnvironment.initialize()
}
val initial_point = new Point(lon.toDouble, lat.toDouble, SpatialReference.create(4326))
val reprojection = GeometryEngine.project(initial_point, SpatialReference.create(3035))
reprojection.toString
}
spark.udf.register("project_to_meters", project_to_meters(_:String, _:String):String)
project_to_meters: (lon: String, lat: String)String
res8: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,StringType,Some(List(StringType, StringType)))
val intersections_reprojected = firstIntersectionStates_rdd.map(line => line.toString.replaceAll("\\[","").replaceAll("\\]",""))
.map(line => {val parts = line.replaceAll("\"","").split(",");
val arrCoords = parts.slice(1,parts.length)
.map(xyStr => {val xy = xyStr.split(" ");
val reprojection = project_to_meters(xy(1).toString, xy(0).toString);
val coords = reprojection.replaceAll(",","").replaceAll("\\[","").split(" ").slice(1,reprojection.length);
val xy_new = coords(0).toString +" "+ coords(1).toString;xy_new});
(parts(0).toString, arrCoords)})
intersections_reprojected: org.apache.spark.rdd.RDD[(String, Array[String])] = MapPartitionsRDD[320] at map at command-3336180278405413:2
val intersections_unpacked = intersections_reprojected.map(item => item._1.toString + "," + item._2(0).toString)
intersections_unpacked: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[321] at map at command-3336180278405414:1
val rdd_first_set_intersections = intersections_unpacked.mapPartitions(_.map(line =>{val parts = line.replaceAll("\"","").split(',');val arrCoords = parts.slice(1, parts.length).map(xyStr => {val xy = xyStr.split(' ');(xy(0).toDouble.toInt, xy(1).toDouble.toInt)});new GMPoint(parts(0), arrCoords(0))}))
rdd_first_set_intersections: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMPoint] = MapPartitionsRDD[322] at mapPartitions at command-3336180278405415:1
- The next step is to fetch the events that are to be map-matched and transform their coordinates as well. Note that for this work, the events of interest are accidents recorded within Lithuania's road network.
val events = spark.read.format("csv").load("/FileStore/tables/LTnodes.csv").rdd.map(line => line.toString)
events.count() //there are 11989 events to be matched
events: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[336] at map at command-3336180278405417:1
res9: Long = 11989
val all_accidents = spark.read.format("csv").load("/FileStore/tables/LTnodes.csv").toDF("PointId", "longitude", "latitude")
all_accidents: org.apache.spark.sql.DataFrame = [PointId: string, longitude: string ... 1 more field]
val rddSecondSet = events.mapPartitions(_.map(line => {val parts = line.replaceAll("\"","").replaceAll("\\[","").replaceAll("\\]","").split(',');new GMPoint(parts(0), (parts(1).toDouble.toInt, parts(2).toDouble.toInt))}))
rddSecondSet: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMPoint] = MapPartitionsRDD[345] at mapPartitions at command-3336180278405419:1
1st round of Map Matching
- In this first round the focus is around the intersection points and the events occurring within a predefined distance from them. Here the distance tolerance is set to 20 meters and the number of neighbours to be found is 1.
val geoMatch = new GeoMatch(false, 256, 20, (-1, -1, -1, -1)) //dimension of the Hilber curve=256, default value, should be a power of 2.
geoMatch: org.cusp.bdi.gm.GeoMatch = GeoMatch(false,256,20.0,(-1,-1,-1,-1))
val resultRDD = geoMatch.spatialJoinKNN(rdd_first_set_intersections, rddSecondSet, 1, false)
resultRDD: org.apache.spark.rdd.RDD[(org.cusp.bdi.gm.geom.GMPoint, scala.collection.mutable.ListBuffer[org.cusp.bdi.gm.geom.GMPoint])] = MapPartitionsRDD[358] at mapPartitions at GeoMatch.scala:94
- 3743 events (out of 11989) are found to be within a 20 meter distance radius from intersection points.
resultRDD.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => !(element._2.isEmpty)).count()
res11: Long = 3743
val result_first_round = resultRDD.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => !(element._2.isEmpty)).map(element => (element._1, element._2(0))).toDF("PointId", "State")
result_first_round: org.apache.spark.sql.DataFrame = [PointId: string, State: string]
val intersection_counts = result_first_round.groupBy("State").count
intersection_counts: org.apache.spark.sql.DataFrame = [State: string, count: bigint]
- One of the advantages of GeoMatch is that it carried all of the data points that are to be matched throughout the pipeline, even in the case where no match is found. This is key in this case, since the points were not matched successfully during this first round are subject to a second iteration where they are to be matched against the remaining of the State Space.
val unmatched_events = resultRDD.filter(element => (element._2.isEmpty)).map(element => element._1.payload).toDF("id")
val second_set_second_round = unmatched_events.join(all_accidents, unmatched_events("id") === all_accidents("PointId")).drop("id").rdd.map(line => line.toString)
val rddSecondSetSecondRound = second_set_second_round
.mapPartitions(_.map(line => {val parts = line.replaceAll("\"","").replaceAll("\\[","").replaceAll("\\]","").split(',');
new GMPoint(parts(0),(parts(1).toDouble.toInt, parts(2).toDouble.toInt))}))
unmatched_events: org.apache.spark.sql.DataFrame = [id: string]
second_set_second_round: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[373] at map at command-3336180278405428:2
rddSecondSetSecondRound: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMPoint] = MapPartitionsRDD[374] at mapPartitions at command-3336180278405428:4
- The remaining of the State Space consists of the edges of the Road Graph. In the following cells, we fetch these edges and associate them with their OSM coordinates and their reporjection.
val edges = spark.read.parquet("dbfs:/_checkpoint/edges_LT_initial") //edges of G0
val vertices = spark.read.parquet("dbfs:/_checkpoint/vertices_LT_initial").toDF("vertexId", "latitude", "longitude") //vertices of G0
edges: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint]
vertices: org.apache.spark.sql.DataFrame = [vertexId: bigint, latitude: double ... 1 more field]
val src_coordinates = edges.join(vertices,vertices("vertexId") === edges("src"), "left_outer").drop("vertexId").withColumnRenamed("latitude", "src_latitude").withColumnRenamed("longitude","src_longitude")
val edge_coordinates = src_coordinates.join(vertices,vertices("vertexId") === src_coordinates("dst")).drop("vertexId").withColumnRenamed("latitude", "dst_latitude").withColumnRenamed("longitude", "dst_longitude")
src_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 2 more fields]
edge_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 4 more fields]
val concat_coordinates = edge_coordinates.select($"src",concat($"src_latitude",lit(" "),$"src_longitude").alias("src_coordinates"), $"dst",concat($"dst_latitude",lit(" "),$"dst_longitude").alias("dst_coordinates"))
concat_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, src_coordinates: string ... 2 more fields]
val linestring_coordinates = concat_coordinates.select($"src", $"dst",concat($"src_coordinates", lit(","), $"dst_coordinates").alias("list_of_coordinates"))
linestring_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 1 more field]
val first = linestring_coordinates.select(concat(lit("LineString:"),$"src",lit("+"), $"dst").alias("LineString"),$"list_of_coordinates")
first: org.apache.spark.sql.DataFrame = [LineString: string, list_of_coordinates: string]
val ways_reprojected = first.rdd.map(line => line.toString.replaceAll("\\[","").replaceAll("\\]","")).map(line => {val parts = line.replaceAll("\"","").split(",");val arrCoords = parts.slice(1,parts.length).map(xyStr => {val xy = xyStr.split(' ');val reprojection = project_to_meters(xy(1).toString, xy(0).toString);val coords = reprojection.replaceAll(",","").replaceAll("\\[","").split(" ").slice(1,reprojection.length);val xy_new = coords(0).toString +" "+ coords(1).toString;xy_new});(parts(0).toString, arrCoords)})
ways_reprojected: org.apache.spark.rdd.RDD[(String, Array[String])] = MapPartitionsRDD[389] at map at command-3336180278405435:1
val ways_unpacked = ways_reprojected.map(item => item._1.toString + "," + item._2(0).toString + "," + item._2(1).toString)
ways_unpacked: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[390] at map at command-3336180278405436:1
val rdd_first_set = ways_unpacked
.mapPartitions(_.map(line =>{val parts = line.replaceAll("\"","").split(',');
val arrCoords = parts.slice(1, parts.length).map(xyStr => {val xy = xyStr.split(' ');(xy(0).toDouble.toInt, xy(1).toDouble.toInt)});
new GMLineString(parts(0), arrCoords)}))
rdd_first_set: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMLineString] = MapPartitionsRDD[391] at mapPartitions at command-3336180278405437:2
- In this second round of Map-Matching, the distance threshold is set to be 200 meters. The dimension of the Hilbert index curve is again set to each desault value (256) and the number of nearest neighnours to be found is 1.
val geoMatchSecond = new GeoMatch(false, 256, 200, (-1, -1, -1, -1))
geoMatchSecond: org.cusp.bdi.gm.GeoMatch = GeoMatch(false,256,200.0,(-1,-1,-1,-1))
val resultRDDsecond = geoMatchSecond.spatialJoinKNN(rdd_first_set, rddSecondSetSecondRound, 1, false)
resultRDDsecond: org.apache.spark.rdd.RDD[(org.cusp.bdi.gm.geom.GMPoint, scala.collection.mutable.ListBuffer[org.cusp.bdi.gm.geom.GMLineString])] = MapPartitionsRDD[404] at mapPartitions at GeoMatch.scala:94
- The number of events that do not lie within a 200 meter radius from road segments is 269.
resultRDDsecond.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => (element._2.isEmpty)).count()
res20: Long = 269
- We are interested in how many events are matched against each state.
val res = resultRDDsecond.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => !(element._2.isEmpty))
res: org.apache.spark.rdd.RDD[(String, scala.collection.mutable.ListBuffer[String])] = MapPartitionsRDD[408] at filter at command-3336180278405445:1
val result_second_round = res.map(element => (element._1, element._2(0))).toDF("PointId", "State")
result_second_round: org.apache.spark.sql.DataFrame = [PointId: string, State: string]
val edge_counts = result_second_round.groupBy("State").count
edge_counts: org.apache.spark.sql.DataFrame = [State: string, count: bigint]
val state_counts = edge_counts.union(intersection_counts)
state_counts: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [State: string, count: bigint]
val all_intersection_states = rdd_first_set_intersections.toDF("stateId", "coords").drop("coords")
val all_edge_states = rdd_first_set.toDF("stateId", "coords").drop("coords")
val all_states = all_intersection_states.union(all_edge_states)
all_states.count //number of states
all_intersection_states: org.apache.spark.sql.DataFrame = [stateId: string]
all_edge_states: org.apache.spark.sql.DataFrame = [stateId: string]
all_states: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [stateId: string]
res24: Long = 399394
- Find the states with no event has been matched against, assign count value equal to 0 and union them with the rest of the states_counts. This way, each state in the State Space is assigned a numerical value representing the number of accidents that have occurred within that state.
val s1 = all_states.join(state_counts, all_states("stateId") === state_counts("State"), "left_outer").drop("State")
val s_final = s1.na.fill(0)
s1: org.apache.spark.sql.DataFrame = [stateId: string, count: bigint]
s_final: org.apache.spark.sql.DataFrame = [stateId: string, count: bigint]
s_final.distinct.agg(sum("count")).show() //11720 events in total successfully matched
+----------+
|sum(count)|
+----------+
| 11720|
+----------+
def trim_id(stateId: String): String = {
val res = stateId.split(":")(1)
return res
}
def trim_point(pointId: String): String = {
val res = pointId.split(" ")(1)
return res
}
spark.udf.register("trim_point", trim_point(_:String): String)
spark.udf.register("trim_id", trim_id(_:String): String)
trim_id: (stateId: String)String
trim_point: (pointId: String)String
res28: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,StringType,Some(List(StringType)))
val total_result = result_first_round.union(result_second_round)
val trimed_total_result = total_result.selectExpr("trim_point(PointId) as point", "trim_id(State) as state")
total_result: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [PointId: string, State: string]
trimed_total_result: org.apache.spark.sql.DataFrame = [point: string, state: string]
- Return here after notebook
034_06SimulatingArrivalTimesNHPP_Inversion
- We want to map the simulated graph elements into an exact location
val df = spark.read.parquet("dbfs:/roadSafety/simulation_location").toDF("simulated_location", "arrival_time")
val location_id = df.select("simulated_location")
df: org.apache.spark.sql.DataFrame = [simulated_location: string, arrival_time: double]
location_id: org.apache.spark.sql.DataFrame = [simulated_location: string]
import org.apache.spark.sql.functions._
val intersection_samples = location_id.join(nodes_df, col("simulated_location") === col("id")).select("simulated_location", "latitude", "longitude")
intersection_samples.count
val edge_ids = edge_coordinates.withColumn("edge_id", concat(col("src"), lit("+"), col("dst")))
val edge_samples = location_id.join(edge_ids, col("simulated_location") === col("edge_id")).drop("src", "dst", "edge_id")
import org.apache.spark.sql.functions._
intersection_samples: org.apache.spark.sql.DataFrame = [simulated_location: string, latitude: double ... 1 more field]
edge_ids: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 5 more fields]
edge_samples: org.apache.spark.sql.DataFrame = [simulated_location: string, src_latitude: double ... 3 more fields]
import org.apache.spark.mllib.random.RandomRDDs
val random_edge_coordinates = edge_samples.withColumn("random_sample", rand())
import org.apache.spark.mllib.random.RandomRDDs
random_edge_coordinates: org.apache.spark.sql.DataFrame = [simulated_location: string, src_latitude: double ... 4 more fields]
- For each simulated edge, generate a two dimensional uniform sample and scale it according to the coordinates of the edge's source and destination
def random_lat(src_lat: Double, dst_lat: Double, sample: Double): Double = {
val lat_min = src_lat.min(dst_lat)
val lat_max = src_lat.max(dst_lat)
val lat = sample * (lat_max - lat_min) + lat_min
return lat
}
def random_lon(src_lon: Double, dst_lon: Double, sample: Double): Double = {
val lon_min = src_lon.min(dst_lon)
val lon_max = src_lon.max(dst_lon)
val lon = sample * (lon_max - lon_min) + lon_min
return lon
}
spark.udf.register("random_lat", random_lat(_: Double, _: Double, _: Double): Double)
spark.udf.register("random_lon", random_lon(_: Double, _: Double, _: Double): Double)
val random_coordinates = random_edge_coordinates.selectExpr("random_lat(src_latitude, dst_latitude, random_sample) as latitude", "random_lon(src_longitude, dst_longitude, random_sample) as longitude")
random_lat: (src_lat: Double, dst_lat: Double, sample: Double)Double
random_lon: (src_lon: Double, dst_lon: Double, sample: Double)Double
random_coordinates: org.apache.spark.sql.DataFrame = [latitude: double, longitude: double]
val df_final = random_coordinates.union(intersection_samples.select("latitude", "longitude"))
df_final.count()
df_final: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [latitude: double, longitude: double]
res36: Long = 12089
df_final.show()
Output:
+------------------+------------------+
| latitude| longitude|
+------------------+------------------+
|54.66xxx |25.29yyy |
+------------------+------------------+
Map-Matching Events on a State Space / Coarsened Road Graph with GeoMatch
Stavroula Rafailia Vlachou (LinkedIn) and Raazesh Sainudiin (LinkedIn).
This project was supported by SENSMETRY through a Data Science Project Internship
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou and
Databricks University Alliance with infrastructure credits from AWS to
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.
2022, Uppsala, Sweden
import org.apache.spark.graphx._
import sqlContext.implicits._
import scala.collection.JavaConversions._
import org.cusp.bdi.gm.GeoMatch
import org.cusp.bdi.gm.geom.GMPoint
import org.cusp.bdi.gm.geom.GMLineString
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
import org.apache.spark.graphx._
import sqlContext.implicits._
import scala.collection.JavaConversions._
import org.cusp.bdi.gm.GeoMatch
import org.cusp.bdi.gm.geom.GMPoint
import org.cusp.bdi.gm.geom.GMLineString
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
spark.conf.set("spark.sql.parquet.binaryAsString", true)
val nodes_df = spark.read.parquet("dbfs:/datasets/osm/lithuania/lithuania.osm.pbf.node.parquet")
val ways_df = spark.read.parquet("dbfs:/datasets/osm/lithuania/lithuania.osm.pbf.way.parquet")
nodes_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 7 more fields]
ways_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 6 more fields]
//convert the nodes to Dataset containing the fields of interest
case class NodeEntry(nodeId: Long, latitude: Double, longitude: Double, tags: Seq[String])
val nodeDS = nodes_df.map(node =>
NodeEntry(node.getAs[Long]("id"),
node.getAs[Double]("latitude"),
node.getAs[Double]("longitude"),
node.getAs[Seq[Row]]("tags").map{case Row(key:String, value:String) => value}
)).cache()
defined class NodeEntry
nodeDS: org.apache.spark.sql.Dataset[NodeEntry] = [nodeId: bigint, latitude: double ... 2 more fields]
The first step is to obtain the state space. The State Space consists of road segments and intersection points. The road segments correspond to the edges of the graph while the intersection points can be retrieved from the ways and the nodes dataset as those nodes that lie in at least one way. All coordinates should be in the spatial reference system 3035. To implement the map matching it is better to keep all intermediate points from each edge.
display(dbutils.fs.ls("dbfs:/LT"))
| path | name | size |
|---|---|---|
| dbfs:/LT/intersections/ | intersections/ | 0.0 |
val intersections = spark.read.parquet("dbfs:/LT/intersections")
intersections.show(1)
+----------------+
|intersectionNode|
+----------------+
| 270958413|
+----------------+
only showing top 1 row
intersections: org.apache.spark.sql.DataFrame = [intersectionNode: bigint]
intersections.count
res5: Long = 162325
The next step is to obtain the coordinates of the intersection points and convert them into decimal degrees.
val intersection_points = nodeDS.join(intersections, intersections("intersectionNode") === nodeDS("nodeId")).drop("tags", "nodeId").select("intersectionNode", "latitude", "longitude")
intersection_points.show(1)
+----------------+----------+------------------+
|intersectionNode| latitude| longitude|
+----------------+----------+------------------+
| 15389886|54.7309125|25.239701200000003|
+----------------+----------+------------------+
only showing top 1 row
intersection_points: org.apache.spark.sql.DataFrame = [intersectionNode: bigint, latitude: double ... 1 more field]
intersection_points.count()
res8: Long = 162325
import org.apache.spark.sql.functions.{concat, lit}
val concat_coordinates = intersection_points.select($"intersectionNode",concat($"latitude",lit(" "),$"longitude").alias("coordinates"))
concat_coordinates.show(1, false)
+----------------+-----------------------------+
|intersectionNode|coordinates |
+----------------+-----------------------------+
|15389886 |54.7309125 25.239701200000003|
+----------------+-----------------------------+
only showing top 1 row
import org.apache.spark.sql.functions.{concat, lit}
concat_coordinates: org.apache.spark.sql.DataFrame = [intersectionNode: bigint, coordinates: string]
val firstIntersectionStates = concat_coordinates.select(concat(lit("LineString:"),$"intersectionNode").alias("LineString"),$"coordinates")
firstIntersectionStates.show(1, false)
val firstIntersectionStates_rdd = firstIntersectionStates.rdd
firstIntersectionStates_rdd.take(1)
+-------------------+-----------------------------+
|LineString |coordinates |
+-------------------+-----------------------------+
|LineString:15389886|54.7309125 25.239701200000003|
+-------------------+-----------------------------+
only showing top 1 row
firstIntersectionStates: org.apache.spark.sql.DataFrame = [LineString: string, coordinates: string]
firstIntersectionStates_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[545] at rdd at command-197980058855229:3
res11: Array[org.apache.spark.sql.Row] = Array([LineString:15389886,54.7309125 25.239701200000003])
if(!ArcGISRuntimeEnvironment.isInitialized())
{
ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
ArcGISRuntimeEnvironment.initialize()
}
def project_to_meters(lon: String, lat: String): String = {
if(!ArcGISRuntimeEnvironment.isInitialized())
{
ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
ArcGISRuntimeEnvironment.initialize()
}
val initial_point = new Point(lon.toDouble, lat.toDouble, SpatialReference.create(4326))
val reprojection = GeometryEngine.project(initial_point, SpatialReference.create(3035))
reprojection.toString
}
spark.udf.register("project_to_meters", project_to_meters(_:String, _:String):String)
project_to_meters: (lon: String, lat: String)String
res14: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,StringType,Some(List(StringType, StringType)))
val intersections_reprojected = firstIntersectionStates_rdd.map(line => line.toString.replaceAll("\\[","").replaceAll("\\]","")).map(line => {val parts = line.replaceAll("\"","").split(",");val arrCoords = parts.slice(1,parts.length).map(xyStr => {val xy = xyStr.split(" ");val reprojection = project_to_meters(xy(1).toString, xy(0).toString);val coords = reprojection.replaceAll(",","").replaceAll("\\[","").split(" ").slice(1,reprojection.length);val xy_new = coords(0).toString +" "+ coords(1).toString;xy_new});(parts(0).toString, arrCoords)})
intersections_reprojected: org.apache.spark.rdd.RDD[(String, Array[String])] = MapPartitionsRDD[547] at map at command-197980058855232:1
intersections_reprojected.take(1)
res15: Array[(String, Array[String])] = Array((LineString:15389886,Array(5294624.872733 3617234.130316)))
val intersections_unpacked = intersections_reprojected.map(item => item._1.toString + "," + item._2(0).toString)
intersections_unpacked.take(1)
intersections_unpacked: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[548] at map at command-197980058855234:1
res16: Array[String] = Array(LineString:15389886,5294624.872733 3617234.130316)
val rdd_first_set_intersections = intersections_unpacked.mapPartitions(_.map(line =>{val parts = line.replaceAll("\"","").split(',');val arrCoords = parts.slice(1, parts.length).map(xyStr => {val xy = xyStr.split(' ');(xy(0).toDouble.toInt, xy(1).toDouble.toInt)});new GMPoint(parts(0), arrCoords(0))}))
rdd_first_set_intersections: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMPoint] = MapPartitionsRDD[549] at mapPartitions at command-197980058855235:1
rdd_first_set_intersections.take(1)
res17: Array[org.cusp.bdi.gm.geom.GMPoint] = Array(GMPoint(LineString:15389886,(5294624,3617234)))
Next, we need to obtain the set of points that are to be map matched. In this case the set of points corresponds to the accident events occuring in LT.
val events = spark.read.format("csv").load("/FileStore/tables/LTnodes.csv").rdd.map(line => line.toString)
events: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[563] at map at command-197980058855239:1
events.take(1)
Output:
Array([Point LT2019XXX,52aaa.18bbb,36ccc.21ddd])
val all_accidents = spark.read.format("csv").load("FileStore/tables/LTnodes.csv").toDF("PointId", "longitude", "latitude")
all_accidents: org.apache.spark.sql.DataFrame = [PointId: string, longitude: string ... 1 more field]
val rddSecondSet = events.mapPartitions(_.map(line => {val parts = line.replaceAll("\"","").replaceAll("\\[","").replaceAll("\\]","").split(',');new GMPoint(parts(0), (parts(1).toDouble.toInt, parts(2).toDouble.toInt))}))
rddSecondSet: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMPoint] = MapPartitionsRDD[572] at mapPartitions at command-197980058855241:1
Implement Map Matching
val geoMatch = new GeoMatch(false, 256, 20, (-1, -1, -1, -1)) //n(=dimension of the Hilber curve) should be a power of 2.
geoMatch: org.cusp.bdi.gm.GeoMatch = GeoMatch(false,256,20.0,(-1,-1,-1,-1))
val resultRDD = geoMatch.spatialJoinKNN(rdd_first_set_intersections, rddSecondSet, 1, false)
resultRDD: org.apache.spark.rdd.RDD[(org.cusp.bdi.gm.geom.GMPoint, scala.collection.mutable.ListBuffer[org.cusp.bdi.gm.geom.GMPoint])] = MapPartitionsRDD[585] at mapPartitions at GeoMatch.scala:94
The output of the above command with IDs and locations anonymised is as follows:
+----------------------------------------+---------------------------------------------+
|k |line |
+----------------------------------------+---------------------------------------------+
|[Point LT20xyABCDEF, [521xxxx, 362yyyy]]|[[LineString:1254578sss, [521zzzz, 362zzzz]]]|
+----------------------------------------+---------------------------------------------+
only showing top 1 rows
resultRDD.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => (element._2.isEmpty)).count()
res19: Long = 8246
resultRDD.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => !(element._2.isEmpty)).count()
res20: Long = 3743
val unmatched_events = resultRDD.filter(element => (element._2.isEmpty)).map(element => element._1.payload).toDF("id")
val second_set_second_round = unmatched_events.join(all_accidents, unmatched_events("id") === all_accidents("PointId")).drop("id").rdd.map(line => line.toString)
val rddSecondSetSecondRound = second_set_second_round.mapPartitions(_.map(line => {val parts = line.replaceAll("\"","").replaceAll("\\[","").replaceAll("\\]","").split(',');new GMPoint(parts(0), (parts(1).toDouble.toInt, parts(2).toDouble.toInt))}))
unmatched_events: org.apache.spark.sql.DataFrame = [id: string]
second_set_second_round: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[599] at map at command-197980058855248:3
rddSecondSetSecondRound: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMPoint] = MapPartitionsRDD[600] at mapPartitions at command-197980058855248:5
val edges = spark.read.parquet("dbfs:/_checkpoint/edges_LT_100")
val vertices = spark.read.parquet("dbfs:/_checkpoint/vertices_LT_100").toDF("vertexId", "latitude", "longitude")
edges: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint]
vertices: org.apache.spark.sql.DataFrame = [vertexId: bigint, latitude: double ... 1 more field]
edges.show(1)
+--------+----------+
| src| dst|
+--------+----------+
|31451266|4397542060|
+--------+----------+
only showing top 1 row
val src_coordinates = edges.join(vertices,vertices("vertexId") === edges("src"), "left_outer").drop("vertexId").withColumnRenamed("latitude", "src_latitude").withColumnRenamed("longitude","src_longitude")
val edge_coordinates = src_coordinates.join(vertices,vertices("vertexId") === src_coordinates("dst")).drop("vertexId").withColumnRenamed("latitude", "dst_latitude").withColumnRenamed("longitude", "dst_longitude")
src_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 2 more fields]
edge_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 4 more fields]
import org.apache.spark.sql.functions.{concat, lit}
val concat_coordinates = edge_coordinates.select($"src",concat($"src_latitude",lit(" "),$"src_longitude").alias("src_coordinates"), $"dst",concat($"dst_latitude",lit(" "),$"dst_longitude").alias("dst_coordinates"))
import org.apache.spark.sql.functions.{concat, lit}
concat_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, src_coordinates: string ... 2 more fields]
concat_coordinates.show(1, false)
+----------+---------------------+--------+-------------------------------------+
|src |src_coordinates |dst |dst_coordinates |
+----------+---------------------+--------+-------------------------------------+
|4095919448|54.6666894 25.1168508|31447217|54.666942600000006 25.115928200000003|
+----------+---------------------+--------+-------------------------------------+
only showing top 1 row
val linestring_coordinates = concat_coordinates.select($"src", $"dst",concat($"src_coordinates", lit(","), $"dst_coordinates").alias("list_of_coordinates"))
linestring_coordinates: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint ... 1 more field]
linestring_coordinates.show(1, false)
+----------+--------+-----------------------------------------------------------+
|src |dst |list_of_coordinates |
+----------+--------+-----------------------------------------------------------+
|4095919448|31447217|54.6666894 25.1168508,54.666942600000006 25.115928200000003|
+----------+--------+-----------------------------------------------------------+
only showing top 1 row
val first = linestring_coordinates.select(concat(lit("LineString:"),$"src",lit("+"), $"dst").alias("LineString"),$"list_of_coordinates")
first: org.apache.spark.sql.DataFrame = [LineString: string, list_of_coordinates: string]
val first_rdd = first.rdd
first_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[675] at rdd at command-197980058855258:1
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
if(!ArcGISRuntimeEnvironment.isInitialized())
{
ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
ArcGISRuntimeEnvironment.initialize()
}
def project_to_meters(lon: String, lat: String): String = {
if(!ArcGISRuntimeEnvironment.isInitialized())
{
ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
ArcGISRuntimeEnvironment.initialize()
}
val initial_point = new Point(lon.toDouble, lat.toDouble, SpatialReference.create(4326))
val reprojection = GeometryEngine.project(initial_point, SpatialReference.create(3035))
reprojection.toString
}
spark.udf.register("project_to_meters", project_to_meters(_:String, _:String):String)
project_to_meters: (lon: String, lat: String)String
res31: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,StringType,Some(List(StringType, StringType)))
first_rdd.take(1)
res32: Array[org.apache.spark.sql.Row] = Array([LineString:4095919448+31447217,54.6666894 25.1168508,54.666942600000006 25.115928200000003])
val ways_reprojected = first_rdd.map(line => line.toString.replaceAll("\\[","").replaceAll("\\]","")).map(line => {val parts = line.replaceAll("\"","").split(",");val arrCoords = parts.slice(1,parts.length).map(xyStr => {val xy = xyStr.split(" ");val reprojection = project_to_meters(xy(1).toString, xy(0).toString);val coords = reprojection.replaceAll(",","").replaceAll("\\[","").split(" ").slice(1,reprojection.length);val xy_new = coords(0).toString +" "+ coords(1).toString;xy_new});(parts(0).toString,arrCoords)})
ways_reprojected: org.apache.spark.rdd.RDD[(String, Array[String])] = MapPartitionsRDD[677] at map at command-197980058855264:1
ways_reprojected.take(1)
res33: Array[(String, Array[String])] = Array((LineString:4095919448+31447217,Array(5288428.785893 3608569.901562, 5288364.771866 3608585.141629)))
ways_reprojected.map(item => item._2(1)).take(1)
res34: Array[String] = Array(5288364.771866 3608585.141629)
val ways_unpacked = ways_reprojected.map(item => item._1.toString + "," + item._2(0).toString + "," + item._2(1).toString)
ways_unpacked: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[679] at map at command-197980058855265:1
val rdd_first_set = ways_unpacked.mapPartitions(_.map(line =>{val parts = line.replaceAll("\"","").split(',');val arrCoords = parts.slice(1, parts.length).map(xyStr => {val xy = xyStr.split(' ');(xy(0).toDouble.toInt, xy(1).toDouble.toInt)});new GMLineString(parts(0), arrCoords)}))
rdd_first_set: org.apache.spark.rdd.RDD[org.cusp.bdi.gm.geom.GMLineString] = MapPartitionsRDD[680] at mapPartitions at command-197980058855267:1
rdd_first_set.count()
res35: Long = 730237
def unpack_lat(str: String): String = {
val lat = str.replaceAll(",","").replaceAll("\\[","").split(" ")(2)
return lat
}
spark.udf.register("unpack_lat", unpack_lat(_:String): String)
def unpack_lon(str: String): String = {
val lon = str.replaceAll(",","").replaceAll("\\[","").split(" ")(1)
return lon
}
spark.udf.register("unpack_lon", unpack_lon(_:String): String)
unpack_lat: (str: String)String
unpack_lon: (str: String)String
res36: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,StringType,Some(List(StringType)))
val geoMatchSecond = new GeoMatch(false, 256, 200, (-1, -1, -1, -1)) //n(=dimension of the Hilber curve) should be a power of 2.
geoMatchSecond: org.cusp.bdi.gm.GeoMatch = GeoMatch(false,256,200.0,(-1,-1,-1,-1))
val resultRDDsecond = geoMatchSecond.spatialJoinKNN(rdd_first_set, rddSecondSetSecondRound, 1, false)
resultRDDsecond: org.apache.spark.rdd.RDD[(org.cusp.bdi.gm.geom.GMPoint, scala.collection.mutable.ListBuffer[org.cusp.bdi.gm.geom.GMLineString])] = MapPartitionsRDD[693] at mapPartitions at GeoMatch.scala:94
resultRDDsecond.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => (element._2.isEmpty)).count()
res37: Long = 275
The next step is for each state to obtain the count
val res = resultRDDsecond.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => !(element._2.isEmpty))
res: org.apache.spark.rdd.RDD[(String, scala.collection.mutable.ListBuffer[String])] = MapPartitionsRDD[697] at filter at command-197980058855277:1
val res_df = res.map(element => (element._1, element._2(0))).toDF("PointId", "State")
res_df: org.apache.spark.sql.DataFrame = [PointId: string, State: string]
val edge_counts = res_df.groupBy("State").count
edge_counts: org.apache.spark.sql.DataFrame = [State: string, count: bigint]
edge_counts.show(2, false)
Output:
+--------------------------------+-----+
|State |count|
+--------------------------------+-----+
|LineString:469327286+3637433937 |a |
|LineString:2488853231+272553182 |b |
|LineString:5074963276+2221962222|c |
+--------------------------------+-----+
val res1 = resultRDD.map(element => (element._1.payload, element._2.map(_.payload))).filter(element => !(element._2.isEmpty))
val res1_df = res1.map(element => (element._1, element._2(0))).toDF("PointId", "State")
val intersection_counts = res1_df.groupBy("State").count
res1: org.apache.spark.rdd.RDD[(String, scala.collection.mutable.ListBuffer[String])] = MapPartitionsRDD[706] at filter at command-197980058855280:1
res1_df: org.apache.spark.sql.DataFrame = [PointId: string, State: string]
intersection_counts: org.apache.spark.sql.DataFrame = [State: string, count: bigint]
import org.apache.spark.sql.functions._
val state_counts = edge_counts.union(intersection_counts)
state_counts.agg(sum("count")).show()
+----------+
|sum(count)|
+----------+
| 11714|
+----------+
import org.apache.spark.sql.functions._
state_counts: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [State: string, count: bigint]
Find the states with no matched events, assign count value equal to 0 and union them with the rest of the states_counts
val all_intersection_states = rdd_first_set_intersections.toDF("stateId", "coords").drop("coords")
val all_edge_states = rdd_first_set.toDF("stateId", "coords").drop("coords")
val all_states = all_intersection_states.union(all_edge_states)
all_states.count
all_intersection_states: org.apache.spark.sql.DataFrame = [stateId: string]
all_edge_states: org.apache.spark.sql.DataFrame = [stateId: string]
all_states: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [stateId: string]
res49: Long = 892562
val s1 = all_states.join(state_counts, all_states("stateId") === state_counts("State"), "left_outer").drop("State")
val s_final = s1.na.fill(0)
s1: org.apache.spark.sql.DataFrame = [stateId: string, count: bigint]
s_final: org.apache.spark.sql.DataFrame = [stateId: string, count: bigint]
Posterior - Conditional Distribution of State Counts for a Given Time Unit
Stavroula Rafailia Vlachou (LinkedIn) and Raazesh Sainudiin (LinkedIn).
This project was supported by SENSMETRY through a Data Science Project Internship
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou and
Databricks University Alliance with infrastructure credits from AWS to
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.
2022, Uppsala, Sweden
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
import org.apache.spark.sql.expressions.Window
import spark.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
import org.apache.spark.sql.expressions.Window
import spark.implicits._
val different_dates = spark.read.parquet("/FileStore/tables/LTaccidents_id_date.parquet").toDF("id", "date").orderBy($"date".asc).select("date").rdd.map(element => element(0)).collect.toSet;
val distinct_dates = different_dates.toList;
//the conditional distribution for each state given a time unit
def conditional_distribution(sample_date: String): org.apache.spark.sql.DataFrame = {
import spark.implicits._
val id_date = spark.read.parquet("/FileStore/tables/LTaccidents_id_date.parquet").toDF("id", "date")
val matched_events = spark.read.parquet("dbfs:/_checkpoint/GeoMatch_G0").toDF("point", "state")
val state_counts = matched_events.join(id_date, matched_events("point") === id_date("id"), "inner").drop("id").where($"date" === sample_date).groupBy("state").count()
val global_count = state_counts.count.toFloat
val state_space = spark.read.parquet("dbfs:/_checkpoint/StateSpaceInitialG0").toDF("initial_state","count").drop("count")
val per_state_conditional_counts = state_space.join(state_counts, state_space("initial_state") === state_counts("state"), "left_outer").na.fill(0, Seq("count"))
val number_of_states = state_space.count.toFloat
val all_state_counts = per_state_conditional_counts.select("initial_state", "count").withColumn("prior", lit(1f/number_of_states)).orderBy($"count".asc)
val df = all_state_counts.select(col("initial_state"), col("count").cast(FloatType), col("prior")).withColumn("global_count", lit(global_count))
val posteriors = df.selectExpr("initial_state", "count + prior as posterior", "global_count")
val posterior_means = posteriors.selectExpr("initial_state","posterior/(global_count + 1) as posterior_mean").orderBy($"posterior_mean".asc)
posterior_means.createOrReplaceTempView("posterior_means")
val df_1 = spark.sql("select initial_state, posterior_mean,"+" SUM(posterior_mean) over ( order by initial_state rows between unbounded preceding and current row ) cumulative_Sum " + " from posterior_means").toDF("initial_state", "posterior_mean", "cumulative_Sum")
val df_2 = df_1.withColumn("prob_interval", lag($"cumulative_Sum", 1,0).over(Window.orderBy($"cumulative_Sum".asc))).select("initial_state", "prob_interval", "cumulative_Sum")
val probability_intervals = df_2.selectExpr("initial_state", "(prob_interval, cumulative_Sum) as prob_interval")
return probability_intervals
}
conditional_distribution: (sample_date: String)org.apache.spark.sql.DataFrame
//run only once per cluster
var date = ""
for (date <- distinct_dates){
val a = date.toString
var directory = "dbfs:/roadSafety"
val probabilities = conditional_distribution(sample_date=a)
directory += "_" + a
dbutils.fs.mkdirs(directory)
probabilities.write.mode(SaveMode.Overwrite).parquet(directory + "_CD")
probabilities.unpersist
display(dbutils.fs.ls(directory))
}
//The distribution of states independent of time
def unconditional_distribution(): org.apache.spark.sql.DataFrame = {
import spark.implicits._
val state_space = spark.read.parquet("dbfs:/_checkpoint/StateSpaceInitialG0").toDF("initial_state","count").drop("count")
val number_of_states = state_space.count.toFloat
val priors = state_space.select("initial_state").withColumn("prior", lit(1f/number_of_states))
priors.createOrReplaceTempView("priors")
val df_1 = spark.sql("select initial_state, prior,"+" SUM(prior) over ( order by initial_state rows between unbounded preceding and current row ) cumulative_Sum " + " from priors").toDF("initial_state", "prior", "cumulative_Sum")
val df_2 = df_1.withColumn("prob_interval", lag($"cumulative_Sum", 1,0).over(Window.orderBy($"cumulative_Sum".asc))).select("initial_state", "prob_interval", "cumulative_Sum")
val probability_intervals = df_2.selectExpr("initial_state", "(prob_interval, cumulative_Sum) as prob_interval")
return probability_intervals
}
unconditional_distribution: ()org.apache.spark.sql.DataFrame
unconditional_distribution.write.mode("overwrite").parquet("dbfs:/roadSafety_no_date_CD")
Simulating the Arrival Times of a NHPP by Inversion
Stavroula Rafailia Vlachou (LinkedIn) and Raazesh Sainudiin (LinkedIn).
This project was supported by SENSMETRY through a Data Science Project Internship
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou and
Databricks University Alliance with infrastructure credits from AWS to
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.
2022, Uppsala, Sweden
import org.apache.spark.mllib.random._
import math.{log, floor, ceil}
import org.apache.spark.sql.functions._
import scala.util.{Try,Success,Failure}
import scala.util.control.Exception
import org.apache.spark.mllib.random.RandomRDDs._
import scala.collection.mutable.ArrayBuffer
import org.apache.spark.mllib.random._
import math.{log, floor, ceil}
import org.apache.spark.sql.functions._
import scala.util.{Try, Success, Failure}
import scala.util.control.Exception
import org.apache.spark.mllib.random.RandomRDDs._
import scala.collection.mutable.ArrayBuffer
- Load the arrival times of the events from 1 realization of the process
val df = spark.read.parquet("FileStore/tables/LT_time_intervals").select("prev_date")
df: org.apache.spark.sql.DataFrame = [prev_date: bigint]
val ordered_T = df.collect().toArray :+ 1461
val generator = new UniformGenerator()
generator.setSeed(1234L) //set the seed for reproducability of results
generator: org.apache.spark.mllib.random.UniformGenerator = org.apache.spark.mllib.random.UniformGenerator@2c211685
//initialization
var i = 1
var u = generator.nextValue
var E = -math.log(1-u)
var T = 0.0
var m = 0.0
var width = 0.0
var samples = Array[Double]()
val n = 11720 //number of total observations
val k = 1 //number of realisations
i: Int = 1
u: Double = 0.9499610869333489
E: Double = 2.9949543149092834
T: Double = 0.0
m: Double = 0.0
width: Double = 0.0
samples: Array[Double] = Array()
n: Int = 11720
k: Int = 1
while (E < n/k){
m = math.floor(((n+1)*k/n)*E)
width = ordered_T(m.toInt+1).toString.replaceAll("\\[", "").replaceAll("\\]", "").toDouble - ordered_T(m.toInt).toString.replaceAll("\\[", "").replaceAll("\\]", "").toDouble
T = ordered_T(m.toInt).toString.replaceAll("\\[", "").replaceAll("\\]", "").toDouble + width * (((n+1)*k/n)*E - m).toDouble
samples = samples :+ T
i += 1
u = generator.nextValue
E -= math.log(1-u)
}
val arrival_samples = sc.parallelize(samples)
val rounded_arrivals = arrival_samples.map(item => math.ceil(item))
val sample_df = rounded_arrivals.toDF("day").groupBy("day").count.orderBy($"day".asc)
arrival_samples: org.apache.spark.rdd.RDD[Double] = ParallelCollectionRDD[14874] at parallelize at command-1211269020742804:1
rounded_arrivals: org.apache.spark.rdd.RDD[Double] = MapPartitionsRDD[14875] at map at command-1211269020742804:2
sample_df: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [day: double, count: bigint]
sample_df.count() //number of simulated days
sample_df.select(sum("count")).show() //number of simulated events
+----------+
|sum(count)|
+----------+
| 11755|
+----------+
val times = sample_df
val initialisation = sc.parallelize(Seq((" ", 0.0))).toDF("initial_state", "time_unit")
times: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [day: double, count: bigint]
initialisation: org.apache.spark.sql.DataFrame = [initial_state: string, time_unit: double]
val time_day_map = spark.sql("SELECT sequence(to_date('2017-01-01'), to_date('2020-12-31'), interval 1 day) as dates").select(explode($"dates").alias("day_of_year"), (monotonically_increasing_id + 1).alias("time_unit"))
val initial = ArrayBuffer[(String, Double)]()
val times_list = times.collect()
for (time <- times_list){
val day = time_day_map.filter(col("time_unit") === time(0)).select("day_of_year").collect()(0)(0).toString
val count = time(1).asInstanceOf[Long]
try {val conditional_distribution = spark.read.parquet("dbfs:/roadSafety_" + day + "_CD").select($"initial_state", $"prob_interval._1".alias("start"), $"prob_interval._2".alias("end"))
val uniform_samples = uniformRDD(sc,count).toDF()
val cross_samples_intervals = uniform_samples.crossJoin(conditional_distribution)
val samples = cross_samples_intervals.filter("start < value").filter("end >= value").select("initial_state").cache()
val location_time = samples.rdd.map(item => (item(0).toString, time(0).asInstanceOf[Double])).collect()
initial ++= location_time
samples.unpersist
println(time(0).toString)
}
catch {
case u: org.apache.spark.sql.AnalysisException => {
println("Path does not exist " + day + ". Sampling independent of time")
val conditional_distribution = spark.read.parquet("dbfs:/roadSafety_no_date_CD").select($"initial_state", $"prob_interval.prob_interval".alias("start"), $"prob_interval.cumulative_Sum".alias("end"))
val uniform_samples = uniformRDD(sc,count).toDF()
val cross_samples_intervals = uniform_samples.crossJoin(conditional_distribution)
val samples = cross_samples_intervals.filter("start < value").filter("end >= value").select("initial_state").cache()
val location_time = samples.rdd.map(item => (item(0).toString, time(0).asInstanceOf[Double])).collect()
initial ++= location_time
samples.unpersist
println(time(0).toString)
}
}
}
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
11.0
12.0
13.0
14.0
15.0
16.0
17.0
18.0
19.0
20.0
21.0
22.0
23.0
24.0
25.0
26.0
27.0
28.0
29.0
30.0
31.0
32.0
33.0
34.0
35.0
36.0
37.0
38.0
39.0
Path does not exist 2017-02-09. Sampling independent of time
40.0
41.0
42.0
43.0
44.0
45.0
46.0
47.0
48.0
49.0
50.0
51.0
52.0
53.0
54.0
55.0
56.0
57.0
58.0
59.0
60.0
61.0
62.0
63.0
64.0
65.0
66.0
67.0
68.0
69.0
70.0
71.0
72.0
73.0
74.0
75.0
76.0
77.0
78.0
79.0
80.0
81.0
82.0
83.0
84.0
85.0
86.0
87.0
88.0
89.0
90.0
91.0
92.0
93.0
94.0
95.0
96.0
97.0
98.0
99.0
100.0
101.0
102.0
103.0
104.0
105.0
106.0
107.0
108.0
109.0
110.0
112.0
113.0
114.0
115.0
116.0
117.0
119.0
120.0
121.0
122.0
123.0
124.0
125.0
126.0
127.0
128.0
129.0
130.0
131.0
132.0
133.0
134.0
135.0
136.0
137.0
138.0
139.0
140.0
141.0
142.0
143.0
144.0
145.0
146.0
147.0
148.0
149.0
150.0
151.0
152.0
153.0
154.0
155.0
156.0
157.0
158.0
159.0
160.0
161.0
162.0
163.0
164.0
165.0
166.0
167.0
168.0
169.0
170.0
171.0
172.0
173.0
174.0
175.0
176.0
177.0
178.0
179.0
180.0
181.0
182.0
183.0
184.0
185.0
186.0
187.0
188.0
189.0
190.0
191.0
192.0
193.0
194.0
195.0
196.0
197.0
198.0
199.0
200.0
201.0
202.0
203.0
204.0
205.0
206.0
207.0
208.0
209.0
210.0
211.0
212.0
213.0
214.0
215.0
216.0
217.0
218.0
219.0
220.0
221.0
222.0
223.0
224.0
225.0
226.0
227.0
228.0
229.0
230.0
231.0
232.0
233.0
234.0
235.0
236.0
237.0
238.0
239.0
240.0
241.0
242.0
243.0
244.0
245.0
246.0
247.0
248.0
249.0
250.0
251.0
252.0
253.0
254.0
255.0
256.0
257.0
258.0
259.0
260.0
261.0
262.0
263.0
264.0
265.0
266.0
267.0
268.0
269.0
270.0
271.0
272.0
273.0
274.0
275.0
276.0
277.0
278.0
279.0
280.0
281.0
282.0
283.0
284.0
285.0
286.0
287.0
288.0
289.0
290.0
291.0
292.0
293.0
294.0
295.0
296.0
297.0
298.0
299.0
300.0
301.0
302.0
303.0
304.0
305.0
306.0
307.0
308.0
309.0
310.0
311.0
313.0
314.0
315.0
316.0
317.0
318.0
319.0
320.0
321.0
322.0
323.0
324.0
325.0
326.0
327.0
328.0
329.0
330.0
331.0
332.0
333.0
334.0
335.0
336.0
337.0
338.0
339.0
340.0
341.0
342.0
343.0
344.0
345.0
346.0
347.0
348.0
349.0
350.0
351.0
352.0
353.0
354.0
355.0
356.0
357.0
358.0
359.0
360.0
361.0
362.0
363.0
364.0
365.0
366.0
367.0
368.0
369.0
370.0
371.0
372.0
373.0
374.0
375.0
376.0
377.0
378.0
379.0
380.0
381.0
382.0
383.0
384.0
385.0
386.0
387.0
388.0
389.0
390.0
391.0
392.0
393.0
394.0
395.0
396.0
397.0
398.0
399.0
400.0
401.0
402.0
403.0
404.0
405.0
406.0
407.0
408.0
409.0
410.0
411.0
413.0
414.0
415.0
417.0
418.0
419.0
420.0
421.0
422.0
423.0
424.0
425.0
426.0
427.0
428.0
429.0
430.0
431.0
432.0
433.0
434.0
435.0
436.0
437.0
438.0
439.0
440.0
441.0
442.0
443.0
444.0
445.0
446.0
447.0
448.0
449.0
450.0
451.0
452.0
453.0
454.0
455.0
456.0
458.0
459.0
460.0
461.0
462.0
463.0
464.0
465.0
466.0
467.0
468.0
469.0
470.0
471.0
472.0
473.0
474.0
475.0
476.0
477.0
478.0
479.0
480.0
481.0
482.0
483.0
484.0
485.0
486.0
487.0
488.0
489.0
490.0
491.0
492.0
493.0
494.0
495.0
496.0
497.0
498.0
499.0
500.0
501.0
502.0
503.0
504.0
505.0
506.0
507.0
508.0
509.0
510.0
511.0
512.0
513.0
514.0
515.0
516.0
517.0
518.0
519.0
520.0
521.0
522.0
523.0
524.0
525.0
526.0
527.0
528.0
529.0
530.0
531.0
532.0
533.0
534.0
535.0
536.0
537.0
538.0
539.0
540.0
541.0
542.0
543.0
544.0
545.0
546.0
547.0
548.0
549.0
550.0
551.0
552.0
553.0
554.0
555.0
556.0
557.0
558.0
559.0
560.0
561.0
562.0
563.0
564.0
565.0
566.0
567.0
568.0
569.0
570.0
571.0
572.0
573.0
574.0
575.0
576.0
577.0
578.0
579.0
580.0
581.0
582.0
583.0
584.0
585.0
586.0
587.0
588.0
589.0
590.0
591.0
592.0
593.0
594.0
595.0
596.0
597.0
598.0
599.0
600.0
601.0
602.0
603.0
604.0
605.0
606.0
607.0
608.0
609.0
610.0
611.0
612.0
613.0
614.0
615.0
616.0
617.0
618.0
619.0
620.0
621.0
622.0
623.0
624.0
625.0
626.0
627.0
628.0
629.0
630.0
631.0
632.0
633.0
634.0
635.0
636.0
637.0
638.0
639.0
640.0
641.0
642.0
643.0
644.0
645.0
646.0
647.0
648.0
649.0
650.0
651.0
652.0
653.0
654.0
655.0
656.0
657.0
658.0
659.0
660.0
661.0
662.0
663.0
664.0
665.0
666.0
667.0
668.0
669.0
670.0
671.0
672.0
673.0
674.0
675.0
676.0
677.0
678.0
679.0
680.0
681.0
682.0
683.0
684.0
685.0
686.0
687.0
688.0
689.0
690.0
691.0
692.0
693.0
694.0
695.0
696.0
697.0
698.0
699.0
700.0
701.0
702.0
703.0
704.0
705.0
706.0
707.0
708.0
709.0
710.0
711.0
712.0
713.0
714.0
715.0
716.0
717.0
718.0
719.0
720.0
721.0
722.0
723.0
724.0
725.0
726.0
727.0
728.0
729.0
730.0
731.0
732.0
733.0
734.0
735.0
737.0
738.0
739.0
740.0
741.0
742.0
743.0
744.0
745.0
746.0
747.0
748.0
749.0
750.0
751.0
752.0
753.0
754.0
755.0
756.0
757.0
758.0
759.0
760.0
761.0
762.0
763.0
765.0
766.0
767.0
768.0
769.0
770.0
772.0
773.0
774.0
775.0
776.0
777.0
778.0
779.0
780.0
781.0
782.0
783.0
784.0
785.0
786.0
787.0
788.0
789.0
790.0
791.0
792.0
793.0
794.0
795.0
796.0
797.0
798.0
799.0
801.0
802.0
803.0
804.0
806.0
807.0
808.0
809.0
810.0
811.0
812.0
813.0
814.0
815.0
816.0
817.0
818.0
819.0
820.0
821.0
822.0
823.0
824.0
825.0
826.0
827.0
828.0
829.0
830.0
831.0
832.0
833.0
834.0
835.0
836.0
837.0
838.0
839.0
840.0
841.0
842.0
843.0
844.0
845.0
846.0
847.0
848.0
849.0
850.0
851.0
852.0
853.0
854.0
855.0
856.0
857.0
858.0
859.0
860.0
861.0
862.0
863.0
864.0
865.0
866.0
867.0
868.0
869.0
870.0
871.0
872.0
873.0
874.0
875.0
876.0
877.0
878.0
879.0
880.0
881.0
882.0
883.0
884.0
885.0
886.0
887.0
888.0
889.0
890.0
891.0
892.0
893.0
894.0
895.0
896.0
897.0
898.0
899.0
900.0
901.0
902.0
903.0
904.0
905.0
906.0
907.0
908.0
909.0
910.0
911.0
912.0
913.0
914.0
915.0
916.0
917.0
918.0
919.0
920.0
921.0
922.0
923.0
924.0
925.0
926.0
927.0
928.0
929.0
930.0
931.0
932.0
933.0
934.0
935.0
936.0
937.0
938.0
939.0
940.0
941.0
942.0
943.0
944.0
945.0
946.0
947.0
948.0
949.0
950.0
951.0
952.0
953.0
954.0
955.0
956.0
957.0
958.0
959.0
960.0
961.0
962.0
963.0
964.0
965.0
966.0
967.0
968.0
969.0
970.0
971.0
972.0
973.0
974.0
975.0
976.0
977.0
978.0
979.0
980.0
981.0
982.0
983.0
984.0
985.0
986.0
987.0
988.0
989.0
990.0
991.0
992.0
993.0
994.0
995.0
996.0
997.0
998.0
999.0
1000.0
1001.0
1002.0
1003.0
1004.0
1005.0
1006.0
1007.0
1008.0
1009.0
1010.0
1011.0
1012.0
1013.0
1014.0
1015.0
1016.0
1017.0
1018.0
1019.0
1020.0
1021.0
1022.0
1023.0
1024.0
1025.0
1026.0
1027.0
1028.0
1029.0
1030.0
1031.0
1032.0
1033.0
1034.0
1035.0
1036.0
1037.0
1038.0
1039.0
1040.0
1041.0
1042.0
1043.0
1044.0
1045.0
1046.0
1047.0
1048.0
1049.0
1050.0
1051.0
1052.0
1053.0
1054.0
1055.0
1056.0
1057.0
1058.0
1059.0
1060.0
1061.0
1062.0
1063.0
1064.0
1065.0
1066.0
1067.0
1068.0
1069.0
1070.0
1071.0
1072.0
1073.0
1074.0
1075.0
1076.0
1077.0
1078.0
1079.0
1080.0
1081.0
1082.0
1083.0
1084.0
1085.0
1086.0
1087.0
1088.0
1089.0
1090.0
1091.0
1093.0
1094.0
1095.0
1096.0
1097.0
1098.0
1099.0
1100.0
1101.0
1102.0
1103.0
1104.0
1105.0
1106.0
1107.0
1108.0
1109.0
1110.0
1111.0
1112.0
1113.0
1114.0
1115.0
1116.0
1117.0
1118.0
1119.0
1120.0
1121.0
1122.0
1123.0
1124.0
1125.0
1126.0
1127.0
1128.0
1129.0
1130.0
1131.0
1132.0
1133.0
1134.0
1135.0
1136.0
1137.0
1138.0
1139.0
1140.0
1141.0
1142.0
1143.0
1144.0
1145.0
1146.0
1147.0
1148.0
1149.0
1150.0
1151.0
1152.0
1153.0
1154.0
1155.0
1156.0
1157.0
1158.0
1159.0
1160.0
1161.0
1162.0
1163.0
1164.0
1165.0
1166.0
1167.0
1168.0
1169.0
1170.0
1171.0
1172.0
1173.0
1174.0
1175.0
1176.0
1177.0
1178.0
1179.0
1180.0
1181.0
1183.0
1184.0
1185.0
1186.0
1187.0
1188.0
1189.0
1190.0
1191.0
1192.0
1194.0
1195.0
1196.0
1197.0
1198.0
1199.0
1200.0
1201.0
1202.0
1203.0
1204.0
1205.0
1206.0
1208.0
1209.0
1210.0
1211.0
1212.0
1213.0
1214.0
1215.0
1216.0
1217.0
1218.0
1219.0
1220.0
1221.0
1223.0
1224.0
1225.0
1226.0
1227.0
1228.0
1229.0
1230.0
1231.0
1232.0
1233.0
1234.0
1235.0
1236.0
1237.0
1238.0
1239.0
1241.0
1242.0
1243.0
1244.0
1245.0
1246.0
1247.0
1248.0
1249.0
1250.0
1251.0
1252.0
1253.0
1254.0
1255.0
1256.0
1257.0
1258.0
1259.0
1260.0
1261.0
1262.0
1263.0
1264.0
1265.0
1266.0
1267.0
1268.0
1269.0
1270.0
1271.0
1272.0
1273.0
1274.0
1275.0
1276.0
1277.0
1278.0
1279.0
1280.0
1281.0
1282.0
1283.0
1284.0
1285.0
1286.0
1287.0
1288.0
1289.0
1290.0
1291.0
1292.0
1293.0
1294.0
1295.0
1296.0
1297.0
1298.0
1299.0
1300.0
1301.0
1302.0
1303.0
1304.0
1305.0
1306.0
1307.0
1308.0
1309.0
1310.0
1311.0
1312.0
1313.0
1314.0
1315.0
1316.0
1317.0
1318.0
1319.0
1320.0
1321.0
1322.0
1323.0
1324.0
1325.0
1326.0
1327.0
1328.0
1329.0
1330.0
1331.0
1332.0
1333.0
1334.0
1335.0
1336.0
1337.0
1338.0
1339.0
1340.0
1341.0
1342.0
1343.0
1344.0
1345.0
1346.0
1347.0
1348.0
1349.0
1350.0
1351.0
1352.0
1353.0
1354.0
1355.0
1356.0
1357.0
1358.0
1359.0
1360.0
1361.0
1362.0
1363.0
1364.0
1365.0
1366.0
1367.0
1368.0
1369.0
1370.0
1371.0
1372.0
1373.0
1374.0
1375.0
1376.0
1377.0
1378.0
1379.0
1380.0
1381.0
1382.0
1383.0
1384.0
1385.0
1387.0
1388.0
1389.0
1390.0
1391.0
1392.0
1393.0
1394.0
1395.0
1396.0
1397.0
1398.0
1399.0
1400.0
1401.0
1402.0
1403.0
1404.0
1405.0
1406.0
1407.0
1408.0
1409.0
1410.0
1411.0
1412.0
time_day_map: org.apache.spark.sql.DataFrame = [day_of_year: date, time_unit: bigint]
initial: scala.collection.mutable.ArrayBuffer[(String, Double)] = ArrayBuffer((4368444509,1.0), (2424668863+4975677371,1.0), (1625682383,2.0), (2370300562+2370300566,2.0), (3205833256,2.0), (3205833256,2.0), (7731932519+33700189,2.0), (33140425+2320320623,2.0), (7731932519+33700189,2.0), (3205833256,2.0), (822048855,2.0), (33140425+2320320623,2.0), (6717810040+1092044002,3.0), (8850792014,3.0), (5387293279,3.0), (59600242+721036420,3.0), (2587005829,3.0), (291194903,4.0), (1584051936+429511771,4.0), (1584051936+429511771,4.0), (8242746297+2588236883,4.0), (730037987,4.0), (2762296139,4.0), (291194903,4.0), (730037987,4.0), (8242746297+2588236883,4.0), (2762296139,4.0), (798797373+2465504066,5.0), (798797373+2465504066,5.0), (1600441347,5.0), (3868060765+3868060765,5.0), (1600441347,5.0), (7879758835+7879758835,6.0), (470874618+1795329374,6.0), (1222823443,6.0), (470874618+1795329374,6.0), (3762326457,7.0), (1930969711,7.0), (2471815931,8.0), (2471815931,8.0), (319250909,8.0), (1163019324,9.0), (462706139+6900302992,9.0), (1119743298,9.0), (3051839354+6073948246,9.0), (1295828592,10.0), (1584065362,10.0), (1747606475,10.0), (8647346250+5230985201,10.0), (959451147,10.0), (2336797434+2336797436,10.0), (727331982,10.0), (3841856354,10.0), (1747606475,10.0), (727331982,10.0), (32871408+1699178230,10.0), (388222759+2619379829,10.0), (425442963,10.0), (2669381052,11.0), (7554572709,11.0), (1406196913+1507667881,11.0), (903290706+981434283,11.0), (7554572709,11.0), (1908755771+1908755753,11.0), (1165609337+420749684,12.0), (2680169742+8418869072,12.0), (1165869863+1662728370,12.0), (2680169742+8418869072,12.0), (1649361502,12.0), (1917119589+2557260234,12.0), (1165609337+420749684,12.0), (8315333427+983995722,12.0), (1183150115,12.0), (59613397,12.0), (2680169742+8418869072,12.0), (59613397,12.0), (7694675604+7694675604,13.0), (415545990,13.0), (1649797526+388227874,13.0), (253277187,13.0), (2588236884+8242746296,13.0), (305467163,13.0), (2203515240+168854751,13.0), (2588236884+8242746296,13.0), (3504903953+984408228,13.0), (1236407339,13.0), (3208381288,14.0), (894241356+7667433908,14.0), (792695963,14.0), (33735303,14.0), (2667703355+1386152713,14.0), (3208381288,14.0), (792695963,14.0), (894241356+7667433908,14.0), (279058228+73365133,15.0), (279058228+73365133,15.0), (279058228+73365133,15.0), (363672709,16.0), (363672709,16.0), (5798210145,16.0), (371713055,16.0), (32600448,16.0), (32083743,16.0), (32083743,16.0), (319247356,16.0), (967110682,16.0), (32325841,16.0), (3583899803,16.0), (60347482,17.0), (79672822+303583628,17.0), (32324664+4533351034,17.0), (427527701,17.0), (9363277628+9363277628,17.0), (1815575570,17.0), (4634922480,17.0), (5975667825+5975667825,17.0), (2677669585,17.0), (1020331765,18.0), (2320097163,18.0), (419237353,18.0), (419237353,18.0), (6905476274,18.0), (4450697716+4450697716,18.0), (5633388659,18.0), (1020331765,18.0), (2320097163,18.0), (8622036237+1580774004,18.0), (1192596587,19.0), (409755460,19.0), (363563559,19.0), (1022296148+410874210,19.0), (2264484031+1014608971,19.0), (363563559,19.0), (32845143,19.0), (2379377598,19.0), (32845143,19.0), (409755460,19.0), (2827597845,19.0), (409755460,19.0), (32845143,19.0), (32845143,19.0), (2574590933+38454926,19.0), (2505129295+2483169589,20.0), (364152927,20.0), (364152927,20.0), (7345759454,20.0), (837556761,20.0), (2044841002,20.0), (364152927,20.0), (364152927,20.0), (837556761,20.0), (4988458872,20.0), (461087005,21.0), (1639502771+1639502893,21.0), (739986370+410889520,21.0), (377656562+32598459,21.0), (847313826,21.0), (461087005,21.0), (1796292269+686546553,22.0), (1362685260+1362685270,22.0), (1944439773+1944439370,23.0), (8882257323+8882257323,23.0), (249070390,23.0), (8479396589,24.0), (6735091203,24.0), (2827586261,24.0), (8479396589,24.0), (538957975,24.0), (410606100+2434603229,25.0), (2252732116,25.0), (2204305386,26.0), (3114251682,26.0), (1329638683+419151551,26.0), (32324664,26.0), (32324664,26.0), (1299645458,26.0), (32845016,26.0), (7276369590+7276369590,26.0), (6042420027,26.0), (1820923762+1820923764,26.0), (36141432,26.0), (6042420027,26.0), (3114251682,26.0), (6042420027,26.0), (6042420027,26.0), (1329638683+419151551,26.0), (60734542,27.0), (1622367325,27.0), (264463729,27.0), (264463729,27.0), (264463729,27.0), (264463729,27.0), (32320314+31440369,27.0), (264463729,27.0), (31294780,27.0), (60734542,27.0), (32320314+31440369,27.0), (732200650,27.0), (2914930062,27.0), (2914930062,27.0), (504652363,27.0), (32320314+31440369,27.0), (1855893783+903265851,28.0), (32600451,28.0), (32600451,28.0), (99159029+3438113759,29.0), (4255116003+1314187228,29.0), (2866951734,30.0), (32603235,30.0), (32603235,30.0), (2378120812,30.0), (32600419,30.0), (2320320799+1001229890,31.0), (257179521+280830499,32.0), (364864586,32.0), (2033030280+2704534655,32.0), (364864586,32.0), (3841769599+3841709818,33.0), (707406699+419261237,34.0), (2619379831+2619379829,34.0), (3179603721,34.0), (3756289535+4460747443,34.0), (257178390,34.0), (5239539831+1124773938,34.0), (5239539831+1124773938,34.0), (365221734,34.0), (2265713612+2214926252,34.0), (3179603721,34.0), (181324716+430270954,34.0), (4267144632+1860705058,35.0), (919098478+504651594,35.0), (2119185927+9275274666,35.0), (4267144632+1860705058,35.0), (4267144632+1860705058,35.0), (2396386210,35.0), (667305142,35.0), (1740329002+8307613765,35.0), (1740329002+8307613765,35.0), (267528863+267528909,36.0), (6281971598,36.0), (4273563149,36.0), (1669465612+1669465948,37.0), (3270112620,38.0), (3709684883+3568781836,38.0), (5134670286+2938466972,38.0), (3270112620,38.0), (2211152928+2211152924,38.0), (2752159044,39.0), (2808766795,39.0), (2752159044,39.0), (5200261128,40.0), (1116078427+1124976131,41.0), (181324580,41.0), (2769155072,41.0), (5437941245,41.0), (5244364267,41.0), (1116078427+1124976131,41.0), (1116078427+1124976131,41.0), (698987043,41.0), (695491357,42.0), (8591908000+8591908000,42.0), (2845965238+6454345013,43.0), (2845965238+6454345013,43.0), (2845965238+6454345013,43.0), (1189263478,43.0), (2225754900,43.0), (33733769,44.0), (393806398,44.0), (33733769,44.0), (1134386488,44.0), (3791804903+1285072225,44.0), (33733769,44.0), (5929248812+5929248812,44.0), (442828950,45.0), (61103392,45.0), (60734594,46.0), (31436065,46.0), (8316810005,46.0), (8316810005,46.0), (181324716,46.0), (5037480027,46.0), (60734594,46.0), (60734594,46.0), (673605204,46.0), (449061049+2244985716,46.0), (2214843642+984408176,46.0), (60734594,46.0), (181324716,46.0), (2415765398,46.0), (2415765398,46.0), (1765993885,47.0), (6800010191+3451134345,47.0), (307434796+3777092701,47.0), (5605047102,48.0), (837556757,48.0), (316413715,48.0), (1800855075+3495077800,48.0), (32871384+32070360,48.0), (32083873,48.0), (837556757,48.0), (1837718516,49.0), (901004064,49.0), (1837718516,49.0), (8012758414+8012758414,49.0), (367664567+367665337,49.0), (901004064,49.0), (1837718516,49.0), (983988024+983995593,50.0), (3546166405+3546166405,50.0), (1791724822+510069981,50.0), (365354022,50.0), (365354022,50.0), (5582510225+1165869020,50.0), (363382358+5581876714,51.0), (738822019+738821988,51.0), (363382358+5581876714,51.0), (735564684,51.0), (738822019+738821988,51.0), (363382358+5581876714,51.0), (31451266,51.0), (291537629+6080716335,51.0), (7129328348+363422218,52.0), (8325783100,52.0), (2687132352,52.0), (2479010123+2479010123,52.0), (2687132352,52.0), (7129328348+363422218,52.0), (8325783100,52.0), (41640317,53.0), (41640317,53.0), (983977283,53.0), (969236862,53.0), (41640317,53.0), (7366187586,53.0), (983977283,53.0), (421555281,53.0), (468185991+1972218284,53.0), (421555281,53.0), (421555281,53.0), (2559562351+2473750225,53.0), (468185991+1972218284,53.0), (1362397315+2565174886,54.0), (2190745603+1534489849,55.0), (707948589+730037913,55.0), (3197923028,55.0), (2346338702,55.0), (4703986855,55.0), (1812189130,55.0), (477196849+1396536694,55.0), (2346338702,55.0), (2899475498+863177358,55.0), (2899475498+863177358,55.0), (3241251443,55.0), (750600011+504652363,56.0), (2379129581+2379129537,56.0), (452649792,56.0), (1408174637,56.0), (4047563197+4047563197,56.0), (1832790970,57.0), (621880643+1045983864,57.0), (2110411666+448680811,57.0), (33733769,57.0), (621880643+1045983864,57.0), (621880643+1045983864,57.0), (32603144,58.0), (1667994930,58.0), (32603144,58.0), (1667994930,58.0), (1667994930,58.0), (2687389613,58.0), (1667994930,58.0), (2209738067,59.0), (2095623986,59.0), (425438825,59.0), (2607606613,59.0), (1272406897+1272405098,59.0), (2095623986,59.0), (2209738067,59.0), (107845176,59.0), (276256802+32441462,59.0), (277163792+2511304183,59.0), (272241691+1356215270,59.0), (805115715,59.0), (2209738067,59.0), (107845176,59.0), (1834041179+1213838159,59.0), (3589633444+1973901879,59.0), (425438825,59.0), (3828294610+3828294610,60.0), (4982837044+8398875924,60.0), (33140649,60.0), (1041189696,60.0), (3828294610+3828294610,60.0), (2727216343,61.0), (372329733,61.0), (908644094+805802505,61.0), (905253285+905194309,61.0), (3115364670+3115364670,61.0), (372329733,61.0), (9457779045+9457779044,61.0), (2727216343,61.0), (908644094+805802505,61.0), (908644094+805802505,61.0), (905253285+905194309,61.0), (2121192109,61.0), (369440763,62.0), (7617598784,62.0), (761890856+1920545801,62.0), (32846319,62.0), (369440763,62.0), (761890856+1920545801,62.0), (6185695200+1254080956,63.0), (8558531217+419251719,63.0), (1582702256,63.0), (1582702256,63.0), (8558531217+419251719,63.0), (1043675144,63.0), (1043675144,63.0), (1632015035+1633097390,64.0), (430270932,64.0), (1367982085,64.0), (2522888081,64.0), (1367982085,64.0), (57174067,65.0), (1342605898+3316552071,65.0), (708727962,65.0), (1754989264,65.0), (708727962,65.0), (1022667594,65.0), (2392092692,65.0), (1754989264,65.0), (1342605898+3316552071,65.0), (32083738,65.0), (708727962,65.0), (272228692,65.0), (1754989264,65.0), (1342605898+3316552071,65.0), (6080716335+291537606,66.0), (998752776,66.0), (5781413491,66.0), (1283445497+5475700247,66.0), (6080716335+291537606,66.0), (5781413491,66.0), (6080716335+291537606,66.0), (2498101067,66.0), (9364003884+9364003880,66.0), (798796935+844163601,67.0), (3828041185,67.0), (6720021623,67.0), (452652743+1044734663,67.0), (430258456+430249899,68.0), (3543875617+452652743,68.0), (249508859+249070050,68.0), (249508859+249070050,68.0), (4020240830+774073633,68.0), (249508859+249070050,68.0), (58320579+410412998,69.0), (2869142866,69.0), (2869142866,69.0), (830394646,69.0), (257079497+2091836308,69.0), (8939171941,69.0), (2869142866,69.0), (32337907,69.0), (32337907,69.0), (58320579+410412998,69.0), (32337907,69.0), (476289852+332015274,69.0), (32337907,69.0), (32144610,70.0), (32592937+2896686179,70.0), (1678145942+1678145911,70.0), (32326858,71.0), (2430043373,71.0), (3780995164+3780995164,71.0), (2430043373,71.0), (1812425941+2188931612,71.0), (5693629241+2260725019,72.0), (1068390022,72.0), (1068390022,72.0), (1068390022,72.0), (500879800,72.0), (4187377477+1583369903,73.0), (3569307993+1916333696,73.0), (130181345+1567689612,73.0), (4187377477+1583369903,73.0), (1112055347,73.0), (2914930062,73.0), (3569307993+1916333696,73.0), (7194916352,73.0), (31440283,73.0), (1112055347,73.0), (1112055347,73.0), (2925616721+2925616755,73.0), (1371986386+1371986386,73.0), (1371986386+1371986386,73.0), (1371986386+1371986386,73.0), (3569307993+1916333696,73.0), (1812811235+1157299848,74.0), (790410920+2061146211,74.0), (33735983,74.0), (3823984678,75.0), (264255099,75.0), (995068123+1910308046,75.0), (870496846,75.0), (6193410234+4076638889,75.0), (316987996+316988454,75.0), (8464905466+4169756813,75.0), (2035645711,76.0), (2253085562,76.0), (364768683,76.0), (79672819,76.0), (1508768578,76.0), (2035645711,76.0), (2035645711,76.0), (2035645711,76.0), (364768683,76.0), (822774232,77.0), (822774232,77.0), (4695077711+36165476,77.0), (983988302+983983054,77.0), (8345609714+1451270220,77.0), (8850792014,78.0), (8850792014,78.0), (73365133,78.0), (1240365147,78.0), (48802835+33732285,78.0), (33140426,78.0), (1240365147,78.0), (7669605770+7669605770,78.0), (137659102+387188989,78.0), (7837166170+8261687225,78.0), (34826082+454487103,79.0), (31440402,79.0), (1935576369+1935576369,79.0), (1344671612,80.0), (1344671612,80.0), (1344671612,80.0), (268069623+32124581,81.0), (292201998,81.0), (60347306,81.0), (1011699241+1011699252,81.0), (1011699241+1011699252,81.0), (3208381292,81.0), (6003310256,81.0), (132213264,81.0), (268069623+32124581,81.0), (60347306,81.0), (1454083019+364842217,81.0), (1454083019+364842217,81.0), (2753247211+2753247199,82.0), (2753247211+2753247199,82.0), (3467475443,82.0), (2753247211+2753247199,82.0), (1333297073,82.0), (672383719+672383713,82.0), (247561146,82.0), (8150983190,82.0), (2215914977,83.0), (428001750+2580428654,83.0), (829195466+8274083855,83.0), (2215914977,83.0), (2215914977,83.0), (2215914977,83.0), (32603219,83.0), (1150522044,83.0), (1258016356+31294789,83.0), (719213330+419261248,84.0), (1886842224,84.0), (445000434,84.0), (2419955225+1780507291,85.0), (32137408,85.0), (7201404920+7201404949,85.0), (2419955225+1780507291,85.0), (3818529745+2319216250,86.0), (2794309227+4770052846,86.0), (4687602661,87.0), (4687602661,87.0), (1663156593,87.0), (677323819,88.0), (1343651192+7201404748,88.0), (32600388,88.0), (1343651192+7201404748,88.0), (7491706202+2610624579,88.0), (268076378+36431893,88.0), (53407297,89.0), (32320162,89.0), (2580428589,89.0), (437273409,89.0), (1343651192+7201404748,89.0), (710972236,89.0), (2273366398,90.0), (983983239,90.0), (983983239,90.0), (734222888+2448153895,90.0), (2273366398,90.0), (3220104621,91.0), (3220104621,91.0), (429634387,91.0), (2168559068,91.0), (393348631+393377026,92.0), (1316592312,92.0), (321615547,92.0), (1316592312,92.0), (332367071,92.0), (4426807408+32324670,92.0), (367664567+367665337,92.0), (8223018014,92.0), (393348631+393377026,92.0), (1846894455+1846894465,92.0), (332367071,92.0), (1316592312,92.0), (367664567+367665337,92.0), (393348631+393377026,92.0), (4426807408+32324670,92.0), (367664567+367665337,92.0), (393348631+393377026,92.0), (31440395,93.0), (5027680424+4397288350,93.0), (1219517801,93.0), (60459851,94.0), (1634421617+3124723371,94.0), (1652188887+1652188550,94.0), (60459851,94.0), (1634421617+3124723371,94.0), (60459851,94.0), (503888809+2194878863,94.0), (1029665281+798855663,94.0), (732084863,94.0), (3205837007,95.0), (1376843583+3777296495,95.0), (2221466124+56012660,95.0), (2221466124+56012660,95.0), (1029665177,96.0), (8622060711,96.0), (441289926,96.0), (821374299+2431196674,96.0), (821374299+2431196674,96.0), (2044963802,96.0), (8622060711,96.0), (1029665177,96.0), (1445306347+938392573,96.0), (1445306347+938392573,96.0), (1388570341,97.0), (32337915+4732541408,97.0), (32337915+4732541408,97.0), (1625538797+1622318296,97.0), (1422975465+5733349675,97.0), (428894865,97.0), (44584759,97.0), (3018600248,97.0), (1388570341,97.0), (1794957826,98.0), (1794957826,98.0), (419587810,98.0), (1794957826,98.0), (419587810,98.0), (1794957826,98.0), (130129965+1250621845,99.0), (2347394398,99.0), (410184858,100.0), (410184858,100.0), (473116304,100.0), (822051441,100.0), (727235504,100.0), (32144610,100.0), (2618230543+2618230543,100.0), (1054345176,100.0), (3691584014+2426837105,100.0), (6466793418+276853372,100.0), (1631514774+1633155543,101.0), (8325783100,101.0), (8010164409+1183150421,101.0), (4723634309+7201404784,102.0), (1212530639,102.0), (388221823,102.0), (4359032233,102.0), (4169756816,102.0), (5908445071+5908445071,102.0), (3730732737+8306548565,102.0), (3730732737+8306548565,102.0), (9241442144,102.0), (4359032233,102.0), (298331548+2707116048,103.0), (32083794,103.0), (340813015,103.0), (1015065017+285957360,103.0), (2376777237,103.0), (819584374,103.0), (32083794,103.0), (340813015,103.0), (2376777237,103.0), (2055871041,103.0), (2055871041,103.0), (4851260583+1769178439,103.0), (298331548+2707116048,103.0), (32083794,103.0), (819584374,103.0), (53829919,103.0), (4014341532+4014341532,104.0), (315548178,104.0), (32329191,104.0), (928079737+928080182,104.0), (429634387,104.0), (315548178,104.0), (315548178,104.0), (428001748,104.0), (258334235,105.0), (816440536+33184937,106.0), (1481655829,106.0), (2411610303+1991224693,106.0), (1481655829,106.0), (3796306736+3796306731,106.0), (499334549,107.0), (6152986966,107.0), (4297075907,108.0), (3390170283,108.0), (337174516,108.0), (337174516,108.0), (698986986+728464190,108.0), (337174516,108.0), (4416859006+4416859006,108.0), (502981961,108.0), (1669388836+2454216733,109.0), (728972240,109.0), (666943301,109.0), (538957968,109.0), (8708590595,109.0), (1525335651,109.0), (32603144,109.0), (1530707872+305220293,110.0), (5291441321+4248151334,110.0), (984408218,110.0), (2060253704,112.0), (510070404,112.0), (2060253704,112.0), (5041997631,112.0), (3954617988+3954617988,112.0), (430784582,112.0), (2338538872,112.0), (430784582,112.0), (2060253704,112.0), (430784582,112.0), (31294796,112.0), (510070404,112.0), (364842112+364842217,112.0), (40563567+40563576,112.0), (841522850,113.0), (1422198764,113.0), (880857400,113.0), (2322675730+2323171669,113.0), (1869130003+6328411651,114.0), (5231165084+32083794,114.0), (1869130003+6328411651,114.0), (34825447,114.0), (5329978798+4119457188,114.0), (7076643916+2088206020,115.0), (32336967+662024171,116.0), (824900014+290372103,116.0), (3755666372,117.0), (279058700+2885304357,117.0), (454487103+454487105,117.0), (4332324526,117.0), (7201262716,117.0), (279058700+2885304357,117.0), (7201262716,117.0), (4314404998+995040223,117.0), (4314404998+995040223,117.0), (8778774527,119.0), (31440405,119.0), (293601806,120.0), (427122850,120.0), (293601806,120.0), (8242746297,120.0), (8242746297,120.0), (4695046213+4695046213,120.0), (4695046213+4695046213,120.0), (2234858278+2234619485,120.0), (4695046213+4695046213,120.0), (2234858278+2234619485,120.0), (775866761,120.0), (1057743807+3319735019,121.0), (79672822,121.0), (671452932,121.0), (2249967174,121.0), (671452932,121.0), (5163798580+5163798581,122.0), (32337924+763634384,122.0), (421368977+1684694008,122.0), (1145771998+822048855,122.0), (924615848+303583180,122.0), (850776167,123.0), (761890866,123.0), (741684488+741684516,124.0), (1530349807,124.0), (1530349807,124.0), (2379979513+2364390338,124.0), (664158323,124.0), (1530349807,124.0), (33143924+8647346243,124.0), (2208884052,124.0), (664158323,124.0), (2914930062,124.0), (664158323,124.0), (1576250536,124.0), (2313116061,124.0), (3905293672+3905293672,124.0), (99171112+468098177,124.0), (2914930062,124.0), (2208884052,124.0), (2379979513+2364390338,124.0), (2914930062,124.0), (1747606452+371714002,124.0), (2033208931,124.0), (1530349807,124.0), (1892263414,124.0), (5834149249,124.0), (4021523900+4021523900,124.0), (32449296+32449294,125.0), (1508768578,125.0), (3817042384+3817042384,125.0), (1508768578,125.0), (638949148+867132773,125.0), (3817042384+3817042384,125.0), (1508768578,125.0), (32449296+32449294,125.0), (2637895619+2043449691,125.0), (2522426546+1833291602,125.0), (1001268109,125.0), (32449296+32449294,125.0), (1885204134+3025020911,125.0), (1001268109,125.0), (282776849+31447762,126.0), (1532706305,126.0), (59971295,126.0), (32324664,126.0), (2699991905+3946708557,126.0), (32324664,126.0), (32324664,126.0), (838630723,126.0), (421369053+1500341905,126.0), (1136532100+3603289690,126.0), (1857324410,126.0), (421369053+1500341905,126.0), (1857324410,126.0), (2417963673,126.0), (2417963673,126.0), (838630723,126.0), (59971295,126.0), (371712696+371712930,126.0), (428001748+837556746,127.0), (8990895368,127.0), (875843369+99159011,127.0), (8296159895,127.0), (8917108974,127.0), (428001748+837556746,127.0), (428001748+837556746,127.0), (673605204,128.0), (1612176500+2544330449,128.0), (1612176500+2544330449,128.0), (2071749918,128.0), (2071749918,128.0), (1612176500+2544330449,128.0), (411397954,128.0), (673605204,128.0), (1612176500+2544330449,128.0), (861783706+861782814,128.0), (673605204,128.0), (2071749918,128.0), (3617407575+32341473,129.0), (3617407575+32341473,129.0), (560477778+560477826,129.0), (1534411751+1534411469,129.0), (3617407575+32341473,129.0), (1534411751+1534411469,129.0), (1306730546,129.0), (4559143062,129.0), (3617407575+32341473,129.0), (1513211288+1513211288,130.0), (1144004920+1144004896,130.0), (2588955866,130.0), (36806444,130.0), (977638371+3592236210,130.0), (977638371+3592236210,130.0), (977638371+3592236210,130.0), (8299422859+5745389608,131.0), (33140425+2320320623,131.0), (2049428806,131.0), (997670026,131.0), (1201326476,131.0), (33140425+2320320623,131.0), (4319726742+4319726758,131.0), (9076976196+9076976196,131.0), (8299422859+5745389608,131.0), (9076976196+9076976196,131.0), (1597662295,131.0), (9076976196+9076976196,131.0), (997670026,131.0), (701189135,131.0), (2049428806,131.0), (41850576+78151900,131.0), (41850576+78151900,131.0), (33140425+2320320623,131.0), (2130812867,131.0), (2049428806,131.0), (41850576+78151900,131.0), (1201326476,131.0), (454487103+454487105,132.0), (7829481088,132.0), (162360518+250998985,132.0), (2868207489,132.0), (343467289+722953613,132.0), (343467289+722953613,132.0), (343467289+722953613,132.0), (2868207489,132.0), (32843212,132.0), (796013109,133.0), (2265022473+332270108,133.0), (5041997631,133.0), (560477894,133.0), (459884206,133.0), (560477894,133.0), (3728924143+864598775,133.0), (9276069144+3404472948,133.0), (509915318,133.0), (796013109,133.0), (509915318,133.0), (509915318,133.0), (5041997631,133.0), (459884206,133.0), (1017873799+1017873843,133.0), (459884206,133.0), (459884206,133.0), (9276069144+3404472948,133.0), (2098014328,134.0), (1479204328+1479204409,134.0), (2441490656+117053372,134.0), (461668700,134.0), (699790533+699790564,134.0), (1399469404,135.0), (1399469404,135.0), (1928971453,136.0), (1092778941+9626313335,136.0), (1928971453,136.0), (4487066323+1937844649,136.0), (364152927,136.0), (430771927,136.0), (769311044+1598659408,136.0), (1091364909+8245890249,136.0), (769311044+1598659408,136.0), (1092778941+9626313335,136.0), (2534694080+9227149677,136.0), (769311044+1598659408,136.0), (919597247+417492988,136.0), (821373861+2119428072,136.0), (420760611,136.0), (769311044+1598659408,136.0), (5698711201,136.0), (2520255231+2034323927,136.0), (181324580,136.0), (2520255231+2034323927,136.0), (7294603948+7294603949,136.0), (430771927,136.0), (79668785,137.0), (73419186,137.0), (31448343,137.0), (519329104,137.0), (504642878,137.0), (2351498507+2351498445,137.0), (2351498507+2351498445,137.0), (73419186,137.0), (2211708906,137.0), (372329733,137.0), (31448343,137.0), (31448343,137.0), (504642878,137.0), (818675978+560064854,138.0), (58304256+60002230,138.0), (722953640,138.0), (73419192,138.0), (73419192,138.0), (33732301,138.0), (722953640,138.0), (33732301,138.0), (1104466256+474708878,138.0), (1152426544,138.0), (73419192,138.0), (58304256+60002230,138.0), (1152426544,138.0), (33732301,138.0), (2378979201+1744549002,139.0), (2034521571+2034521558,139.0), (2378979201+1744549002,139.0), (2608166052,139.0), (8322485005+4406661355,139.0), (2608166052,139.0), (8246161251,140.0), (3524132662+3524132656,140.0), (8246161251,140.0), (287914012,140.0))
times_list: Array[org.apache.spark.sql.Row] = Array([1.0,2], [2.0,10], [3.0,5], [4.0,10], [5.0,5], [6.0,4], [7.0,2], [8.0,3], [9.0,4], [10.0,13], [11.0,6], [12.0,12], [13.0,10], [14.0,8], [15.0,3], [16.0,11], [17.0,9], [18.0,10], [19.0,15], [20.0,10], [21.0,6], [22.0,2], [23.0,3], [24.0,5], [25.0,2], [26.0,16], [27.0,16], [28.0,3], [29.0,2], [30.0,5], [31.0,1], [32.0,4], [33.0,1], [34.0,11], [35.0,9], [36.0,3], [37.0,1], [38.0,5], [39.0,3], [40.0,1], [41.0,8], [42.0,2], [43.0,5], [44.0,7], [45.0,2], [46.0,15], [47.0,3], [48.0,7], [49.0,7], [50.0,6], [51.0,8], [52.0,7], [53.0,13], [54.0,1], [55.0,11], [56.0,5], [57.0,6], [58.0,7], [59.0,17], [60.0,5], [61.0,12], [62.0,6], [63.0,7], [64.0,5], [65.0,14], [66.0,9], [67.0,4], [68.0,6], [69.0,13], [70.0,3], [71.0,5], [72.0,5], [73.0,16], [74.0,3], [75.0,7], [76.0,9], [77.0,5], [78.0,10], [79.0,3], [80.0,3], [81.0,12], [82.0,8], [83.0,9], [84.0,3], [85.0,4], [86.0,2], [87.0,3], [88.0,6], [89.0,6], [90.0,5], [91.0,4], [92.0,17], [93.0,3], [94.0,9], [95.0,4], [96.0,10], [97.0,9], [98.0,6], [99.0,2], [100.0,10], [101.0,3], [102.0,10], [103.0,16], [104.0,8], [105.0,1], [106.0,5], [107.0,2], [108.0,8], [109.0,7], [110.0,3], [112.0,14], [113.0,4], [114.0,5], [115.0,1], [116.0,2], [117.0,9], [119.0,2], [120.0,11], [121.0,5], [122.0,5], [123.0,2], [124.0,25], [125.0,14], [126.0,18], [127.0,7], [128.0,12], [129.0,9], [130.0,7], [131.0,22], [132.0,9], [133.0,18], [134.0,5], [135.0,2], [136.0,22], [137.0,13], [138.0,14], [139.0,6], [140.0,16], [141.0,5], [142.0,11], [143.0,20], [144.0,11], [145.0,22], [146.0,5], [147.0,28], [148.0,8], [149.0,8], [150.0,10], [151.0,11], [152.0,14], [153.0,10], [154.0,9], [155.0,4], [156.0,10], [157.0,7], [158.0,8], [159.0,7], [160.0,14], [161.0,11], [162.0,9], [163.0,8], [164.0,14], [165.0,7], [166.0,19], [167.0,5], [168.0,12], [169.0,12], [170.0,10], [171.0,4], [172.0,11], [173.0,4], [174.0,16], [175.0,4], [176.0,5], [177.0,12], [178.0,8], [179.0,1], [180.0,5], [181.0,7], [182.0,6], [183.0,8], [184.0,8], [185.0,13], [186.0,8], [187.0,8], [188.0,10], [189.0,5], [190.0,7], [191.0,7], [192.0,7], [193.0,2], [194.0,14], [195.0,14], [196.0,5], [197.0,17], [198.0,13], [199.0,8], [200.0,4], [201.0,10], [202.0,9], [203.0,6], [204.0,6], [205.0,9], [206.0,18], [207.0,11], [208.0,6], [209.0,5], [210.0,13], [211.0,5], [212.0,6], [213.0,10], [214.0,8], [215.0,14], [216.0,7], [217.0,13], [218.0,10], [219.0,10], [220.0,15], [221.0,7], [222.0,14], [223.0,7], [224.0,16], [225.0,6], [226.0,18], [227.0,5], [228.0,13], [229.0,15], [230.0,14], [231.0,13], [232.0,5], [233.0,7], [234.0,3], [235.0,5], [236.0,3], [237.0,8], [238.0,5], [239.0,17], [240.0,16], [241.0,8], [242.0,11], [243.0,13], [244.0,10], [245.0,5], [246.0,5], [247.0,11], [248.0,8], [249.0,9], [250.0,9], [251.0,13], [252.0,18], [253.0,5], [254.0,13], [255.0,19], [256.0,10], [257.0,4], [258.0,9], [259.0,5], [260.0,3], [261.0,9], [262.0,9], [263.0,15], [264.0,15], [265.0,13], [266.0,5], [267.0,18], [268.0,13], [269.0,8], [270.0,20], [271.0,11], [272.0,7], [273.0,6], [274.0,2], [275.0,10], [276.0,6], [277.0,9], [278.0,9], [279.0,14], [280.0,5], [281.0,6], [282.0,5], [283.0,18], [284.0,11], [285.0,6], [286.0,14], [287.0,5], [288.0,7], [289.0,6], [290.0,8], [291.0,6], [292.0,9], [293.0,3], [294.0,21], [295.0,6], [296.0,13], [297.0,5], [298.0,17], [299.0,13], [300.0,8], [301.0,3], [302.0,14], [303.0,7], [304.0,13], [305.0,7], [306.0,8], [307.0,4], [308.0,1], [309.0,4], [310.0,10], [311.0,4], [313.0,3], [314.0,19], [315.0,9], [316.0,10], [317.0,6], [318.0,9], [319.0,12], [320.0,17], [321.0,4], [322.0,7], [323.0,1], [324.0,9], [325.0,6], [326.0,11], [327.0,4], [328.0,12], [329.0,7], [330.0,5], [331.0,4], [332.0,4], [333.0,15], [334.0,9], [335.0,7], [336.0,9], [337.0,4], [338.0,10], [339.0,17], [340.0,8], [341.0,16], [342.0,10], [343.0,5], [344.0,1], [345.0,8], [346.0,4], [347.0,15], [348.0,12], [349.0,7], [350.0,3], [351.0,3], [352.0,6], [353.0,15], [354.0,12], [355.0,20], [356.0,21], [357.0,10], [358.0,11], [359.0,11], [360.0,8], [361.0,13], [362.0,1], [363.0,17], [364.0,6], [365.0,1], [366.0,10], [367.0,6], [368.0,6], [369.0,9], [370.0,17], [371.0,2], [372.0,4], [373.0,11], [374.0,7], [375.0,6], [376.0,5], [377.0,8], [378.0,1], [379.0,5], [380.0,8], [381.0,5], [382.0,13], [383.0,10], [384.0,9], [385.0,6], [386.0,1], [387.0,4], [388.0,6], [389.0,5], [390.0,2], [391.0,9], [392.0,10], [393.0,3], [394.0,23], [395.0,8], [396.0,3], [397.0,12], [398.0,6], [399.0,1], [400.0,1], [401.0,11], [402.0,6], [403.0,7], [404.0,3], [405.0,6], [406.0,3], [407.0,10], [408.0,7], [409.0,9], [410.0,2], [411.0,4], [413.0,9], [414.0,2], [415.0,4], [417.0,8], [418.0,3], [419.0,6], [420.0,3], [421.0,2], [422.0,6], [423.0,6], [424.0,2], [425.0,6], [426.0,3], [427.0,9], [428.0,4], [429.0,4], [430.0,8], [431.0,7], [432.0,16], [433.0,5], [434.0,6], [435.0,3], [436.0,8], [437.0,10], [438.0,4], [439.0,6], [440.0,4], [441.0,6], [442.0,2], [443.0,6], [444.0,6], [445.0,3], [446.0,10], [447.0,6], [448.0,3], [449.0,1], [450.0,2], [451.0,7], [452.0,12], [453.0,3], [454.0,3], [455.0,4], [456.0,3], [458.0,9], [459.0,4], [460.0,7], [461.0,9], [462.0,7], [463.0,8], [464.0,14], [465.0,10], [466.0,9], [467.0,10], [468.0,8], [469.0,7], [470.0,1], [471.0,3], [472.0,7], [473.0,2], [474.0,8], [475.0,7], [476.0,8], [477.0,8], [478.0,15], [479.0,4], [480.0,7], [481.0,5], [482.0,8], [483.0,4], [484.0,3], [485.0,8], [486.0,4], [487.0,10], [488.0,7], [489.0,6], [490.0,18], [491.0,12], [492.0,10], [493.0,21], [494.0,14], [495.0,7], [496.0,23], [497.0,8], [498.0,8], [499.0,6], [500.0,12], [501.0,6], [502.0,2], [503.0,8], [504.0,16], [505.0,4], [506.0,20], [507.0,7], [508.0,12], [509.0,18], [510.0,18], [511.0,9], [512.0,7], [513.0,8], [514.0,13], [515.0,17], [516.0,15], [517.0,10], [518.0,9], [519.0,6], [520.0,6], [521.0,11], [522.0,3], [523.0,8], [524.0,19], [525.0,20], [526.0,12], [527.0,5], [528.0,3], [529.0,10], [530.0,11], [531.0,15], [532.0,16], [533.0,11], [534.0,11], [535.0,5], [536.0,7], [537.0,6], [538.0,6], [539.0,12], [540.0,6], [541.0,4], [542.0,9], [543.0,6], [544.0,6], [545.0,8], [546.0,1], [547.0,3], [548.0,11], [549.0,7], [550.0,6], [551.0,8], [552.0,4], [553.0,11], [554.0,6], [555.0,12], [556.0,5], [557.0,6], [558.0,3], [559.0,4], [560.0,13], [561.0,5], [562.0,16], [563.0,8], [564.0,9], [565.0,10], [566.0,12], [567.0,8], [568.0,4], [569.0,7], [570.0,21], [571.0,7], [572.0,16], [573.0,14], [574.0,10], [575.0,5], [576.0,2], [577.0,5], [578.0,8], [579.0,8], [580.0,22], [581.0,9], [582.0,12], [583.0,9], [584.0,7], [585.0,9], [586.0,12], [587.0,12], [588.0,9], [589.0,9], [590.0,9], [591.0,10], [592.0,4], [593.0,8], [594.0,10], [595.0,14], [596.0,5], [597.0,11], [598.0,6], [599.0,9], [600.0,9], [601.0,17], [602.0,5], [603.0,14], [604.0,6], [605.0,6], [606.0,8], [607.0,12], [608.0,13], [609.0,7], [610.0,10], [611.0,11], [612.0,6], [613.0,18], [614.0,10], [615.0,11], [616.0,8], [617.0,2], [618.0,7], [619.0,11], [620.0,12], [621.0,7], [622.0,15], [623.0,1], [624.0,7], [625.0,5], [626.0,18], [627.0,16], [628.0,17], [629.0,13], [630.0,5], [631.0,5], [632.0,8], [633.0,3], [634.0,9], [635.0,14], [636.0,11], [637.0,8], [638.0,2], [639.0,4], [640.0,8], [641.0,14], [642.0,5], [643.0,8], [644.0,8], [645.0,6], [646.0,9], [647.0,12], [648.0,7], [649.0,6], [650.0,10], [651.0,9], [652.0,7], [653.0,8], [654.0,8], [655.0,10], [656.0,2], [657.0,10], [658.0,7], [659.0,2], [660.0,12], [661.0,8], [662.0,20], [663.0,11], [664.0,7], [665.0,7], [666.0,3], [667.0,6], [668.0,5], [669.0,13], [670.0,6], [671.0,9], [672.0,4], [673.0,5], [674.0,5], [675.0,1], [676.0,2], [677.0,2], [678.0,10], [679.0,11], [680.0,5], [681.0,10], [682.0,13], [683.0,9], [684.0,13], [685.0,5], [686.0,3], [687.0,3], [688.0,6], [689.0,13], [690.0,8], [691.0,7], [692.0,6], [693.0,11], [694.0,2], [695.0,11], [696.0,10], [697.0,8], [698.0,12], [699.0,7], [700.0,17], [701.0,4], [702.0,18], [703.0,16], [704.0,9], [705.0,14], [706.0,11], [707.0,10], [708.0,3], [709.0,12], [710.0,4], [711.0,8], [712.0,13], [713.0,9], [714.0,4], [715.0,14], [716.0,11], [717.0,10], [718.0,2], [719.0,4], [720.0,15], [721.0,8], [722.0,8], [723.0,7], [724.0,3], [725.0,9], [726.0,4], [727.0,5], [728.0,2], [729.0,2], [730.0,2], [731.0,8], [732.0,6], [733.0,7], [734.0,7], [735.0,7], [737.0,6], [738.0,6], [739.0,6], [740.0,6], [741.0,14], [742.0,8], [743.0,2], [744.0,23], [745.0,6], [746.0,4], [747.0,8], [748.0,11], [749.0,2], [750.0,8], [751.0,5], [752.0,4], [753.0,5], [754.0,11], [755.0,9], [756.0,6], [757.0,9], [758.0,16], [759.0,1], [760.0,5], [761.0,5], [762.0,16], [763.0,7], [765.0,8], [766.0,3], [767.0,8], [768.0,10], [769.0,7], [770.0,4], [772.0,7], [773.0,8], [774.0,9], [775.0,2], [776.0,7], [777.0,11], [778.0,7], [779.0,7], [780.0,11], [781.0,4], [782.0,3], [783.0,10], [784.0,5], [785.0,2], [786.0,7], [787.0,6], [788.0,4], [789.0,7], [790.0,5], [791.0,6], [792.0,4], [793.0,7], [794.0,1], [795.0,7], [796.0,6], [797.0,8], [798.0,6], [799.0,5], [801.0,11], [802.0,5], [803.0,5], [804.0,4], [806.0,7], [807.0,5], [808.0,10], [809.0,5], [810.0,4], [811.0,8], [812.0,9], [813.0,4], [814.0,10], [815.0,9], [816.0,6], [817.0,4], [818.0,7], [819.0,8], [820.0,4], [821.0,5], [822.0,11], [823.0,14], [824.0,11], [825.0,10], [826.0,6], [827.0,8], [828.0,6], [829.0,7], [830.0,2], [831.0,4], [832.0,11], [833.0,9], [834.0,6], [835.0,26], [836.0,5], [837.0,7], [838.0,14], [839.0,19], [840.0,6], [841.0,7], [842.0,11], [843.0,5], [844.0,11], [845.0,4], [846.0,18], [847.0,6], [848.0,6], [849.0,2], [850.0,14], [851.0,19], [852.0,6], [853.0,10], [854.0,5], [855.0,3], [856.0,14], [857.0,5], [858.0,8], [859.0,6], [860.0,2], [861.0,14], [862.0,7], [863.0,8], [864.0,10], [865.0,9], [866.0,3], [867.0,3], [868.0,4], [869.0,9], [870.0,9], [871.0,12], [872.0,16], [873.0,19], [874.0,7], [875.0,8], [876.0,5], [877.0,4], [878.0,7], [879.0,9], [880.0,18], [881.0,10], [882.0,12], [883.0,5], [884.0,11], [885.0,11], [886.0,8], [887.0,10], [888.0,10], [889.0,9], [890.0,10], [891.0,17], [892.0,14], [893.0,6], [894.0,13], [895.0,13], [896.0,14], [897.0,5], [898.0,9], [899.0,9], [900.0,12], [901.0,10], [902.0,8], [903.0,17], [904.0,13], [905.0,10], [906.0,9], [907.0,8], [908.0,14], [909.0,8], [910.0,16], [911.0,12], [912.0,14], [913.0,17], [914.0,4], [915.0,4], [916.0,12], [917.0,7], [918.0,8], [919.0,4], [920.0,8], [921.0,11], [922.0,10], [923.0,12], [924.0,7], [925.0,14], [926.0,16], [927.0,11], [928.0,8], [929.0,6], [930.0,5], [931.0,9], [932.0,6], [933.0,12], [934.0,16], [935.0,12], [936.0,11], [937.0,11], [938.0,20], [939.0,12], [940.0,12], [941.0,8], [942.0,6], [943.0,8], [944.0,3], [945.0,21], [946.0,10], [947.0,11], [948.0,12], [949.0,14], [950.0,4], [951.0,7], [952.0,4], [953.0,7], [954.0,6], [955.0,10], [956.0,8], [957.0,4], [958.0,8], [959.0,13], [960.0,12], [961.0,13], [962.0,8], [963.0,5], [964.0,18], [965.0,22], [966.0,15], [967.0,5], [968.0,10], [969.0,13], [970.0,12], [971.0,8], [972.0,6], [973.0,5], [974.0,16], [975.0,1], [976.0,17], [977.0,13], [978.0,10], [979.0,9], [980.0,15], [981.0,11], [982.0,10], [983.0,13], [984.0,15], [985.0,13], [986.0,15], [987.0,18], [988.0,5], [989.0,10], [990.0,7], [991.0,8], [992.0,9], [993.0,12], [994.0,7], [995.0,9], [996.0,10], [997.0,4], [998.0,2], [999.0,6], [1000.0,14], [1001.0,4], [1002.0,2], [1003.0,7], [1004.0,7], [1005.0,8], [1006.0,10], [1007.0,18], [1008.0,13], [1009.0,8], [1010.0,7], [1011.0,10])
val location_simulation = initial.toDF("simulated_location", "arrival_time")
location_simulation: org.apache.spark.sql.DataFrame = [simulated_location: string, arrival_time: double]
location_simulation.show(3)
+--------------------+------------+
| simulated_location|arrival_time|
+--------------------+------------+
| 4368444509| 1.0|
|2424668863+497567...| 1.0|
| 1625682383| 2.0|
+--------------------+------------+
only showing top 3 rows
Transformation of coordinates using Arcgis Runtime library
Virginia Jimenez Mohedano (LinkedIn), Stavroula Rafailia Vlachou (LinkedIn) and Raazesh Sainudiin (LinkedIn).
This project was supported by UAB SENSMETRY through a Data Science Thesis Internship
between 2022-01-17 and 2022-06-05 to Stavroula R. Vlachou and Virginia J. Mohedano
and Databricks University Alliance with infrastructure credits from AWS to
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.
2022, Uppsala, Sweden
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.types._
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql._
import scala.util.matching.Regex
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.types._
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql._
import scala.util.matching.Regex
import com.esri.arcgisruntime.geometry.{Point, SpatialReference, GeometryEngine}
import com.esri.arcgisruntime.geometry.GeometryEngine.project
import com.esri.arcgisruntime._
Arcgis runtime library allows for coordinates transformations.
-
Download arcgis runtime from https://developers.arcgis.com/downloads/#java (.tgz)
-
Install jar (from the "libs" folder) in the cluster.
The version downloaded was 100.4.0
dbutils.fs.mkdirs("dbfs:/arcGISRuntime/")
res1: Boolean = true
tar zxvf /dbfs/arcGISRuntime/arcgis_runtime_sdk_java_100_4_0.tgz -C /dbfs/arcGISRuntime
arcgis-runtime-sdk-java-100.4.0/
arcgis-runtime-sdk-java-100.4.0/LICENSE.txt
arcgis-runtime-sdk-java-100.4.0/README.txt
arcgis-runtime-sdk-java-100.4.0/RELEASE-NOTES.txt
arcgis-runtime-sdk-java-100.4.0/jniLibs/
arcgis-runtime-sdk-java-100.4.0/jniLibs/OSX64/
arcgis-runtime-sdk-java-100.4.0/jniLibs/OSX64/libruntimecore.dylib
arcgis-runtime-sdk-java-100.4.0/jniLibs/OSX64/libruntimecore_java.dylib
arcgis-runtime-sdk-java-100.4.0/jniLibs/LX64/
arcgis-runtime-sdk-java-100.4.0/jniLibs/LX64/libruntimecore_java.so
arcgis-runtime-sdk-java-100.4.0/jniLibs/LX64/libruntimecore.so
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_rxf_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_ground_overlay_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/post_process_viewshed_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_outline_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_outlined_area_pick_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/magnifier_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_atmosphere_phong_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_point_pick_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_pattern_line_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_r16u_test_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_fill_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_atmosphere_modifiers_phong_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_modifiers_phong_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_atmosphere_phongshadow_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_texture_world_depth_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/atmosphere_textured_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/measure_line_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/text_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/point_billboard_depth_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_background_pattern_fill_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_draped_graphics_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_texture_phong_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_phong_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_ground_overlay_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/point_billboard_depth_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_point_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polygon_world_depth_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_line_halo_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_pattern_fill_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/atmosphere_textured_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_area_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/screen_overlay_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_phongshadow_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_rxs_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_sdf_point_pick_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_rxu_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/magnifier_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_pattern_fill_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_texture_phong_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_sdf_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_pattern_fill_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_solid_fill_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_marker_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/screen_image_renderer_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/simple_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polygon_world_depth_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_outlined_area_pick_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/screen_overlay_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/texture_draw_instanced_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_render_target_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_world_depth_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/skybox_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/frustum_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_sdf_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_pattern_fill_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_stencil_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_point_halo_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/frustum_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_marker_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_line_pick_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_point_halo_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polygon_overlay_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_line_halo_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_marker_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/simple_grid_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_stencil_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_area_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_r16s_test_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/offscreen_buffer_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_outlined_area_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_background_solid_fill_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_render_target_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polyline_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/point_billboard_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_r32f_test_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_sdf_point_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_solid_line_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_color_phong_world_instance_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_sdf_point_halo_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_outline_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_area_pick_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_b8g8r8a8un_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/texture_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_sdf_point_pick_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_draped_graphics_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_line_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_texture_phongshadow_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_coor_to_tex_coor_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/simple_grid_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polygon_overlay_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_texture_world_instance_depth_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_solid_line_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_color_phong_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polyline_overlay_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_circle_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_atmosphere_phongshadow_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_world_depth_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_point_pick_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/offscreen_buffer_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/atmosphere_accurate_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_texture_phong_world_instance_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_outline_instance_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/post_process_quad_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/simple_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_color_phongshadow_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_phong_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_line_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_world_instance_depth_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polygon_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_marker_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_outline_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_point_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_solid_fill_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_outlined_area_halo_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_color_phong_world_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_outline_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polyline_world_depth_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_outline_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/atmosphere_accurate_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/texture_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_line_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_world_depth_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_tile_info_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_background_pattern_fill_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/skybox_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_pattern_line_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_sdf_point_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_area_pick_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_texture_phongshadow_world_instance_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_outline_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_sdf_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_outlined_area_halo_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/point_billboard_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/star_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/text_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_fill_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_texture_phongshadow_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_circle_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_outlined_area_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polyline_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_line_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_coor_to_tex_coor_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_color_phongshadow_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_world_depth_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_b8g8r8a8un_custom_filter_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/image_renderer_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_pattern_line_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_line_pick_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_tile_info_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_pattern_line_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/seq_render_sdf_point_halo_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/star_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tex_quad_b8g8r8a8un_adv_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_background_solid_fill_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polygon_outline_overlay_draw_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polygon_outline_overlay_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/vector_tiles_dd_sdf_ps.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_color_phongshadow_world_instance_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/texture_draw_instanced_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/trianglemesh_texture_world_depth_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_atmosphere_phong_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polygon_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/screen_image_renderer_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/measure_line_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/tile_phongshadow_world_draw_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/image_renderer_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/directx/polyline_world_depth_vs.cso
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN64/
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN64/msvcp140.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN64/runtimecore.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN64/concrt140.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN64/runtimecore_java.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN64/vcruntime140.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN64/vccorlib140.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN32/
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN32/msvcp140.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN32/runtimecore.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN32/concrt140.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN32/runtimecore_java.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN32/vcruntime140.dll
arcgis-runtime-sdk-java-100.4.0/jniLibs/WIN32/vccorlib140.dll
arcgis-runtime-sdk-java-100.4.0/legal/
arcgis-runtime-sdk-java-100.4.0/legal/third-party-software-acknowledgements.pdf
arcgis-runtime-sdk-java-100.4.0/legal/EULA.pdf
arcgis-runtime-sdk-java-100.4.0/legal/Copyright_and_Trademarks.pdf
arcgis-runtime-sdk-java-100.4.0/libs/
arcgis-runtime-sdk-java-100.4.0/libs/arcgis-java-api-javadoc.jar
arcgis-runtime-sdk-java-100.4.0/libs/commons-logging-1.2.jar
arcgis-runtime-sdk-java-100.4.0/libs/commons-codec-1.11.jar
arcgis-runtime-sdk-java-100.4.0/libs/arcgis-java-api.jar
arcgis-runtime-sdk-java-100.4.0/libs/gson-2.8.5.jar
arcgis-runtime-sdk-java-100.4.0/resources/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/alaska.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/icegrid2004.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/icegrid2004.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/stlrnc.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/stpaul.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/stgeorge.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/prvi.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/ICEGRID93.LOS
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/prvi.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/stgeorge.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/hawaii.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/stpaul.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/ICEGRID93.LAS
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/conus.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/stlrnc.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/hawaii.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/conus.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/nadcon/alaska.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/hvtdefaults.json
arcgis-runtime-sdk-java-100.4.0/resources/pedata/vertical/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/vertical/egm/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/vertical/egm/egm96.grd
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gtdefaults.json
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/newzealand/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/newzealand/nzgd2kgrid0005.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/brazil/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/brazil/SAD69_003.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/brazil/CA7072_003.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/brazil/SAD96_003.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/brazil/CA61_003.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/austria/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/austria/AT_GIS_GRID.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/japan/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/japan/tky2jgd.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/japan/touhokutaiheiyouoki2011.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/france/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/france/rgf93_ntf.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/france/RGNC1991_IGN72GrandeTerre.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/france/RGNC1991_NEA74Noumea.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/ireland/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/ireland/tm75_etrs89.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/uk/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/uk/osgb36_xrail84.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/uk/OSTN02_NTv2.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/uk/OSTN15_NTv2.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/netherlands/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/netherlands/rdtrans2008.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/portugal/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/portugal/D73_ETRS89_geo.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/portugal/DLX_ETRS89_geo.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/switzerland/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/switzerland/CHENYX06_etrs.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/switzerland/CHENYX06.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/australia/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/australia/National_84_02_07_01.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/australia/A66_National_13_09_01.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/spain/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/spain/100800401.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/spain/peninsula.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/spain/baleares.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/spain/SPED2ETV2.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/germany/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/germany/BETA2007.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/ntv2/germany/NTv2_SN.gsb
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/kshpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/flhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wmhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/c2hpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ohhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/uthpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/imhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mshpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/cnhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/lahpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/arhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/iahpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/tnhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nvhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nchpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ethpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ethpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/c1hpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ndhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/cshpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/alhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/cnhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/njhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wvhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/pvhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wshpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wyhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ohdhihpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/pahpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wyhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mehpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/lahpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nchpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/alhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/azhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/hihpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nvhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wshpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/cohpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wthpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ilhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/schpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/flhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/okhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mohpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/okhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wohpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nbhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/sdhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/azhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/inhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ndhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wohpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mnhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wmhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/eshpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mdhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/emhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/eshpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/iahpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wihpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mihpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nbhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/schpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/vahpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mshpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nmhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/imhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mnhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/guhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wvhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/vahpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/pahpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/pvhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nmhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wihpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nyhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nehpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mohpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/cohpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/kshpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/kyhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nehpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/emhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ohhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/tnhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/cshpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/guhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/njhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/kyhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mihpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/uthpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/c1hpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/gahpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/nyhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ohdhihpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/wthpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/hihpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/arhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/sdhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/gahpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mdhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/ilhpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/inhpgn.los
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/c2hpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/harn/mehpgn.las
arcgis-runtime-sdk-java-100.4.0/resources/pedata/geoid/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/geoid/WGS84.img
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/inspire_cp_CadastralBoundary.gfs
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/esri_StatePlane_extra.wkt
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/gt_ellips.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/gt_datum.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/geoccs.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/vdv452.xsd
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/compdcs.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/netcdf_config.xsd
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/s57attributes.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/gml_registry.xml
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/s57objectclasses_aml.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/gdalvrt.xsd
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/inspire_cp_CadastralParcel.gfs
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/nitf_spec.xsd
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/esri_Wisconsin_extra.wkt
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/ogrvrt.xsd
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/s57agencies.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/vdv452.xml
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/s57objectclasses_iw.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/seed_3d.dgn
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/GDALLogoBW.svg
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/prime_meridian.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/s57objectclasses.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/coordinate_axis.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/pci_datum.txt
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/ozi_datum.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/header.dxf
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/pcs.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/epsg.wkt
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/gdal_datum.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/projop_wparm.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/pcs.override.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/ecw_cs.wkt
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/inspire_cp_CadastralZoning.gfs
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/gdalicon.png
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/GDALLogoColor.svg
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/ruian_vf_ob_v1.gfs
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/ruian_vf_st_v1.gfs
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/pci_ellips.txt
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/ruian_vf_st_uvoh_v1.gfs
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/s57expectedinput.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/stateplane.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/gcs.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/ruian_vf_v1.gfs
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/GDALLogoGS.svg
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/ozi_ellips.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/osmconf.ini
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/s57attributes_aml.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/cubewerx_extra.wkt
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/trailer.dxf
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/esri_extra.wkt
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/ellipsoid.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/gcs.override.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/datum_shift.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/nitf_spec.xml
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/seed_2d.dgn
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/vertcs.override.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/s57attributes_iw.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/unit_of_measure.csv
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/inspire_cp_BasicPropertyUnit.gfs
arcgis-runtime-sdk-java-100.4.0/resources/pedata/gdaldata/vertcs.csv
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/S-52x.stylx
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/S57DataDictionary.xml
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/news57.xml
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/lookup/
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/lookup/asymrefpb.dic
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/lookup/asymrefsb.dic
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/lookup/psymrefs.dic
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/lookup/lsymref.dic
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/lookup/psymreft.dic
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/day.clr
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/day_bright.col
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/day.col
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/day_blackback.col
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/day_whiteback.col
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/dusk.col
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/day_bright.clr
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/day_blackback.clr
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/night.clr
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/dusk.clr
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/night.col
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/s57lookupfiles/colcalib/day_whiteback.clr
arcgis-runtime-sdk-java-100.4.0/resources/hydrography/ECDIS_settings.xml
arcgis-runtime-sdk-java-100.4.0/resources/symbols/
arcgis-runtime-sdk-java-100.4.0/resources/symbols/app6b/
arcgis-runtime-sdk-java-100.4.0/resources/symbols/app6b/app6b.stylx
arcgis-runtime-sdk-java-100.4.0/resources/symbols/mil2525c_b2/
arcgis-runtime-sdk-java-100.4.0/resources/symbols/mil2525c_b2/mil2525c_b2.stylx
arcgis-runtime-sdk-java-100.4.0/resources/symbols/app6d/
arcgis-runtime-sdk-java-100.4.0/resources/symbols/app6d/app6d.stylx
arcgis-runtime-sdk-java-100.4.0/resources/symbols/mil2525d/
arcgis-runtime-sdk-java-100.4.0/resources/symbols/mil2525d/mil2525d.stylx
arcgis-runtime-sdk-java-100.4.0/samples/
arcgis-runtime-sdk-java-100.4.0/samples/arcgis-java-samples-v100.4.0.zip
display(dbutils.fs.ls("dbfs:/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/"))
| path | name | size |
|---|---|---|
| dbfs:/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/LICENSE.txt | LICENSE.txt | 174.0 |
| dbfs:/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/README.txt | README.txt | 1980.0 |
| dbfs:/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/RELEASE-NOTES.txt | RELEASE-NOTES.txt | 5227.0 |
| dbfs:/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/jniLibs/ | jniLibs/ | 0.0 |
| dbfs:/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/legal/ | legal/ | 0.0 |
| dbfs:/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/libs/ | libs/ | 0.0 |
| dbfs:/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/resources/ | resources/ | 0.0 |
| dbfs:/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/samples/ | samples/ | 0.0 |
The library needs to be initialized running the following cell
if(!ArcGISRuntimeEnvironment.isInitialized())
{
ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
ArcGISRuntimeEnvironment.initialize()
}
Initializing...
Java version : 1.8.0_282 (Azul Systems, Inc.) amd64
Read the data that needs to be transformed: in this case, osm location data is transformed.
spark.conf.set("spark.sql.parquet.binaryAsString", true)
val nodes_df = spark.read.parquet("dbfs:/datasets/osm/lithuania/lithuania.osm.pbf.node.parquet")
nodes_df: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 7 more fields]
nodes_df.count()
res1: Long = 21212155
nodes_df.show(1,false)
+--------+-------+-------------+---------+---+--------+----------------------------+----------+------------------+
|id |version|timestamp |changeset|uid|user_sid|tags |latitude |longitude |
+--------+-------+-------------+---------+---+--------+----------------------------+----------+------------------+
|15389886|7 |1427965254000|0 |0 | |[[highway, traffic_signals]]|54.7309125|25.239701200000003|
+--------+-------+-------------+---------+---+--------+----------------------------+----------+------------------+
only showing top 1 row
In this case, the coordinates are expressed in the WGS84 system and they will be projected into meters to be used with GeoMatch. To do this, one just need to change the code for each of the reference systems in the next function.
def project_to_meters(lon: Double, lat: Double): String = {
if(!ArcGISRuntimeEnvironment.isInitialized())
{
ArcGISRuntimeEnvironment.setInstallDirectory("/dbfs/arcGISRuntime/arcgis-runtime-sdk-java-100.4.0/")
ArcGISRuntimeEnvironment.initialize()
}
val initial_point = new Point(lon, lat, SpatialReference.create(4326))
val reprojection = GeometryEngine.project(initial_point, SpatialReference.create(3035))
reprojection.toString
}
spark.udf.register("project_to_meters", project_to_meters(_:Double, _:Double):String)
project_to_meters: (lon: Double, lat: Double)String
res2: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,StringType,Some(List(DoubleType, DoubleType)))
val nodes_converted = nodes_df.selectExpr("id","latitude", "longitude", "project_to_meters(longitude, latitude) as new_coord")
nodes_converted.show(5,false)
+--------+------------------+------------------+---------------------------------------------------------------+
|id |latitude |longitude |new_coord |
+--------+------------------+------------------+---------------------------------------------------------------+
|15389886|54.7309125 |25.239701200000003|Point: [5294624.872733, 3617234.130316, 0.000000, NaN] SR: 3035|
|15389895|54.732171400000006|25.243689500000002|Point: [5294845.235219, 3617425.427234, 0.000000, NaN] SR: 3035|
|15389899|54.7352788 |25.2467356 |Point: [5294962.370295, 3617805.661476, 0.000000, NaN] SR: 3035|
|15389959|54.7355529 |25.2458712 |Point: [5294901.580186, 3617823.871710, 0.000000, NaN] SR: 3035|
|15389961|54.735927100000005|25.245138800000003|Point: [5294846.689805, 3617854.789556, 0.000000, NaN] SR: 3035|
+--------+------------------+------------------+---------------------------------------------------------------+
only showing top 5 rows
nodes_converted: org.apache.spark.sql.DataFrame = [id: bigint, latitude: double ... 2 more fields]
Once the transformation is done, it is necessary to unpack the coordinates as follow
def unpack_lat(str: String): String = {
val lat = str.replaceAll(",","").replaceAll("\\[","").split(" ")(2)
return lat
}
spark.udf.register("unpack_lat", unpack_lat(_:String): String)
def unpack_lon(str: String): String = {
val lon = str.replaceAll(",","").replaceAll("\\[","").split(" ")(1)
return lon
}
spark.udf.register("unpack_lon", unpack_lon(_:String): String)
unpack_lat: (str: String)String
unpack_lon: (str: String)String
res5: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,StringType,Some(List(StringType)))
val new_coordinates = nodes_converted.selectExpr("id as node_id", "unpack_lat(new_coord) as reprojected_lat", "unpack_lon(new_coord) as reprojected_lon")
new_coordinates: org.apache.spark.sql.DataFrame = [node_id: bigint, reprojected_lat: string ... 1 more field]
Now, the new coordinates are expressed in meters.
new_coordinates.show(5,false)
+--------+---------------+---------------+
|node_id |reprojected_lat|reprojected_lon|
+--------+---------------+---------------+
|15389886|3617234.130316 |5294624.872733 |
|15389895|3617425.427234 |5294845.235219 |
|15389899|3617805.661476 |5294962.370295 |
|15389959|3617823.871710 |5294901.580186 |
|15389961|3617854.789556 |5294846.689805 |
+--------+---------------+---------------+
only showing top 5 rows
val nodes_new_coordinates = nodes_df.join(new_croordinates, nodes_df.col("id") === new_coordinates.col("node_id")).selectExpr("id", "version", "timestamp", "changeset", "uid", "user_sid", "tags", "reprojected_lat as latitude", "reprojected_lon as longitude")
nodes_new_coordinates: org.apache.spark.sql.DataFrame = [id: bigint, version: int ... 7 more fields]
nodes_new_coordinates.write.parquet("dbfs:/datasets/osm/lithuania/lithuania_nodes_converted.parquet")
Segmentation of Lithuania by municipalities using Magellan
Virginia Jimenez Mohedano (LinkedIn) and Raazesh Sainudiin (LinkedIn).
This project was supported by UAB SENSMETRY through a Data Science Thesis Internship
between 2022-01-17 and 2022-06-05 to Virginia J.M. and
Databricks University Alliance with infrastructure credits from AWS to
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.
2022, Uppsala, Sweden
Instructions
- Clone the Magellan repository from https://github.com/rahulbsw/magellan.git.
- Build the jar and get it into your local machine.
- In Databricks choose Create -> Library and upload the packaged jar.
- Create a spark 2.4.5 Scala 2.11 cluster with the uploaded Magellan library installed or if you are already running a cluster and installed the uploaded library to it you have to detach and re-attach any notebook currently using that cluster.
import magellan.Point
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.functions._
import org.apache.spark.sql.magellan.dsl.expressions._
val toPointUDF = udf{(x:Double,y:Double) => Point(x,y) }
import magellan.Point
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.functions._
import org.apache.spark.sql.magellan.dsl.expressions._
toPointUDF: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,org.apache.spark.sql.types.PointUDT@36ebe9ff,Some(List(DoubleType, DoubleType)))
After downloading the data (see lasts cells of the notebook), we expect to have the following files in distributed file system (dbfs):
LTcar_reprojected.csvis the file with the data crashes from LT.municipalities.geojsonis the geojson file containing LT municipalities.
First five lines or rows of the crash data containing: ID, Lon, Lat, timestamp
//sc.textFile("dbfs:/datasets/magellan/LTcar_reprojected.csv").take(1).foreach(println)
The output of the above command with IDs and locations anonymised is as follows:
id,latitude,longitude,timestamp
LT20xyABCDEF,55.xxxxxx,21.yyyyyy,20xy-mm-dd hh:20:00.000+01:00
case class CrashRecord(id: String, timestamp: String, point: Point)
defined class CrashRecord
Load accident data and transform latitude and longitude to Magellan's Point
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
val crashes = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("dbfs:/datasets/magellan/LTcar_reprojected.csv").toDF()
val crashes_with_points = crashes.select(col("id"), col("timestamp"), col("longitude").cast(DoubleType), col("latitude").cast(DoubleType)).withColumn("point", toPointUDF($"longitude", $"latitude")).drop("latitude", "longitude").filter(col("timeStamp").isNotNull.as[CrashRecord])
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
crashes: org.apache.spark.sql.DataFrame = [id: string, latitude: double ... 2 more fields]
crashes_with_points: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [id: string, timestamp: timestamp ... 1 more field]
//crashes.show(1)
The output of the above command with IDs and locations anonymised is as follows:
+------------+---------+---------+-------------------+
| id| latitude|longitude| timestamp|
+------------+---------+---------+-------------------+
|LT20xyABCDEF|55.xxxxxx|21.yyyyyy|20xy-mm-dd hh:20:00|
//crashes_with_points.show(1,false)
The output of the above command with IDs and locations anonymised is as follows:
+------------+-------------------+---------------------------+
|id |timestamp |point |
+------------+-------------------+---------------------------+
|LT20xyABCDEF|20xy-mm-dd hh:20:00|Point(21.yyyyyy, 55.xxxxxx)|
val crashRecordCount = crashes_with_points.count() // how many crash records?
crashRecordCount: Long = 11945
The geojson format can spatially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes. Each item usually has attributes that describe it, such as name or temperature.
The name of the municipality in the metadata is "name" so let's keep only that one.
val municipalities = sqlContext.read.format("magellan")
.option("type", "geojson")
.load("dbfs:/datasets/magellan/municipalities.geojson")
.filter($"polygon".isNotNull)
.select($"polygon", $"metadata"("name") as "municipality")
municipalities: org.apache.spark.sql.DataFrame = [polygon: polygon, municipality: string]
municipalities.count()
res41: Long = 60
municipalities.show(100)
+--------------------+--------------------+
| polygon| municipality|
+--------------------+--------------------+
|magellan.Polygon@...|Visagino savivaldybė|
|magellan.Polygon@...|Ignalinos rajono ...|
|magellan.Polygon@...|Zarasų rajono sav...|
|magellan.Polygon@...|Vilkaviškio rajon...|
|magellan.Polygon@...|Šakių rajono savi...|
|magellan.Polygon@...|Utenos rajono sav...|
|magellan.Polygon@...|Švenčionių rajono...|
|magellan.Polygon@...|Šiaulių miesto sa...|
|magellan.Polygon@...|Panevėžio miesto ...|
|magellan.Polygon@...|Elektrėnų savival...|
|magellan.Polygon@...|Vilniaus miesto s...|
|magellan.Polygon@...|Marijampolės savi...|
|magellan.Polygon@...|Kazlų Rūdos saviv...|
|magellan.Polygon@...|Kalvarijos saviva...|
|magellan.Polygon@...|Kauno rajono savi...|
|magellan.Polygon@...|Vilniaus rajono s...|
|magellan.Polygon@...| Pagėgių savivaldybė|
|magellan.Polygon@...|Molėtų rajono sav...|
|magellan.Polygon@...|Anykščių rajono s...|
|magellan.Polygon@...|Klaipėdos miesto ...|
|magellan.Polygon@...|Šalčininkų rajono...|
|magellan.Polygon@...|Širvintų rajono s...|
|magellan.Polygon@...|Trakų rajono savi...|
|magellan.Polygon@...|Palangos miesto s...|
|magellan.Polygon@...|Kretingos rajono ...|
|magellan.Polygon@...|Ukmergės rajono s...|
|magellan.Polygon@...|Panevėžio rajono ...|
|magellan.Polygon@...|Kauno miesto savi...|
|magellan.Polygon@...|Druskininkų saviv...|
|magellan.Polygon@...|Varėnos rajono sa...|
|magellan.Polygon@...|Neringos savivaldybė|
|magellan.Polygon@...|Lazdijų rajono sa...|
|magellan.Polygon@...|Alytaus rajono sa...|
|magellan.Polygon@...|Alytaus miesto sa...|
|magellan.Polygon@...|Rokiškio rajono s...|
|magellan.Polygon@...|Biržų rajono savi...|
|magellan.Polygon@...|Kupiškio rajono s...|
|magellan.Polygon@...| Rietavo savivaldybė|
|magellan.Polygon@...|Pasvalio rajono s...|
|magellan.Polygon@...|Šilutės rajono sa...|
|magellan.Polygon@...|Skuodo rajono sav...|
|magellan.Polygon@...|Klaipėdos rajono ...|
|magellan.Polygon@...|Mažeikių rajono s...|
|magellan.Polygon@...|Pakruojo rajono s...|
|magellan.Polygon@...|Joniškio rajono s...|
|magellan.Polygon@...|Šiaulių rajono sa...|
|magellan.Polygon@...|Akmenės rajono sa...|
|magellan.Polygon@...|Radviliškio rajon...|
|magellan.Polygon@...|Kelmės rajono sav...|
|magellan.Polygon@...|Prienų rajono sav...|
|magellan.Polygon@...|Plungės rajono sa...|
|magellan.Polygon@...|Telšių rajono sav...|
|magellan.Polygon@...|Jonavos rajono sa...|
|magellan.Polygon@...|Raseinių rajono s...|
|magellan.Polygon@...|Tauragės rajono s...|
|magellan.Polygon@...|Kaišiadorių rajon...|
|magellan.Polygon@...|Šilalės rajono sa...|
|magellan.Polygon@...|Kėdainių rajono s...|
|magellan.Polygon@...|Jurbarko rajono s...|
|magellan.Polygon@...|Birštono savivaldybė|
+--------------------+--------------------+
//If we have the same coordinates system, next cell should not be empty
//The geojson file are presented in the WGS84 coordinate system
Join the accidents with the municipalities.
val joined = municipalities
.join(crashes_with_points)
.where($"point" within $"polygon")
.select($"id", $"timestamp", $"municipality", $"point")
joined: org.apache.spark.sql.DataFrame = [id: string, timestamp: timestamp ... 2 more fields]
//joined.show(1,false)
The output of the above command with IDs and locations anonymised is as follows:
+------------+-------------------+--------------------+---------------------------+
|id |timestamp |municipality |point |
+------------+-------------------+--------------------+---------------------------+
|LT20xyABCDEF|2019-09-08 20:10:00|Visagino savivaldybė|Point(26.xxxxxx, 55.yyyyy) |
val crashes_in_municipalities = joined.count()
crashes_in_municipalities: Long = 11937
crashRecordCount - crashes_in_municipalities // records not in the neighbourhood geojson file
res45: Long = 8
val municipality_count = joined
.groupBy($"municipality")
.agg(countDistinct("id").as("acc_count"))
.orderBy(col("acc_count").desc)
municipality_count.show(5,false)
+----------------------------+---------+
|municipality |acc_count|
+----------------------------+---------+
|Vilniaus miesto savivaldybė |2356 |
|Kauno miesto savivaldybė |1461 |
|Klaipėdos miesto savivaldybė|733 |
|Panevėžio miesto savivaldybė|592 |
|Šiaulių miesto savivaldybė |468 |
+----------------------------+---------+
only showing top 5 rows
municipality_count: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [municipality: string, acc_count: bigint]
val municipality_count_freq = municipality_count.withColumn("frequency", col("acc_count")/crashes_in_municipalities)
municipality_count_freq.show(10,false)
+----------------------------+---------+--------------------+
|municipality |acc_count|frequency |
+----------------------------+---------+--------------------+
|Vilniaus miesto savivaldybė |2356 |0.19736952333082014 |
|Kauno miesto savivaldybė |1461 |0.12239256094496105 |
|Klaipėdos miesto savivaldybė|733 |0.061405713328306945|
|Panevėžio miesto savivaldybė|592 |0.04959370025969674 |
|Šiaulių miesto savivaldybė |468 |0.039205830610706205|
|Vilniaus rajono savivaldybė |430 |0.03602245120214459 |
|Kauno rajono savivaldybė |347 |0.02906928038870738 |
|Klaipėdos rajono savivaldybė|289 |0.02421043813353439 |
|Panevėžio rajono savivaldybė|280 |0.02345647985255927 |
|Šiaulių rajono savivaldybė |214 |0.01792745245874173 |
+----------------------------+---------+--------------------+
only showing top 10 rows
municipality_count_freq: org.apache.spark.sql.DataFrame = [municipality: string, acc_count: bigint ... 1 more field]
municipality_count_freq.select("municipality","frequency").write.format("csv").option("header", true).save("dbfs:/datasets/lithuania/municipalities_freq.csv")
Download most updated population data from https://www.registrucentras.lt/p/853 and upload it
val municipality_pop = spark.read.format("csv").option("delimiter",";").option("header", "true").option("inferSchema", "true").load("dbfs:/datasets/lithuania/population.csv").toDF()
municipality_pop: org.apache.spark.sql.DataFrame = [municipality: string, population: int]
municipality_pop.show()
+--------------------+----------+
| municipality|population|
+--------------------+----------+
|Akmenės rajono sa...| 20597|
|Alytaus miesto sa...| 53920|
|Alytaus rajono sa...| 28170|
|Anykščių rajono s...| 24619|
|Birštono savivaldybė| 4425|
|Biržų rajono savi...| 25141|
|Druskininkų saviv...| 21282|
|Elektrėnų savival...| 25903|
|Ignalinos rajono ...| 15495|
|Jonavos rajono sa...| 43564|
|Joniškio rajono s...| 22234|
|Jurbarko rajono s...| 27145|
|Kaišiadorių rajon...| 29746|
|Kalvarijos saviva...| 10737|
|Kauno miesto savi...| 313503|
|Kauno rajono savi...| 105032|
|Kazlų Rūdos saviv...| 11621|
|Kelmės rajono sav...| 27513|
|Klaipėdos miesto ...| 165710|
|Klaipėdos rajono ...| 67232|
+--------------------+----------+
only showing top 20 rows
val municipality_count_pop = municipality_count.join(municipality_pop, municipality_count.col("municipality") === municipality_pop.col("municipality")).withColumn("acc_by_pop", col("acc_count")/col("population")).drop(municipality_pop.col("municipality"))
municipality_count_pop: org.apache.spark.sql.DataFrame = [municipality: string, acc_count: bigint ... 2 more fields]
municipality_count_pop.show()
+--------------------+---------+----------+--------------------+
| municipality|acc_count|population| acc_by_pop|
+--------------------+---------+----------+--------------------+
|Vilniaus miesto s...| 2356| 592389|0.003977116388049069|
|Kauno miesto savi...| 1461| 313503| 0.00466024248571784|
|Klaipėdos miesto ...| 733| 165710|0.004423390260092...|
|Panevėžio miesto ...| 592| 91221|0.006489733723594348|
|Šiaulių miesto sa...| 468| 111289| 0.00420526736694552|
|Vilniaus rajono s...| 430| 108948|0.003946837023167015|
|Kauno rajono savi...| 347| 105032|0.003303755046081194|
|Klaipėdos rajono ...| 289| 67232|0.004298548310328415|
|Panevėžio rajono ...| 280| 38639|0.007246564352079505|
|Šiaulių rajono sa...| 214| 43923|0.004872162648270837|
|Šilutės rajono sa...| 196| 42330|0.004630285849279471|
|Plungės rajono sa...| 185| 35804|0.005167020444643056|
|Raseinių rajono s...| 175| 32598|0.005368427510890238|
|Telšių rajono sav...| 170| 42883|0.003964274887484551|
|Jonavos rajono sa...| 168| 43564|0.003856395188687...|
|Kėdainių rajono s...| 167| 49360|0.003383306320907...|
|Tauragės rajono s...| 165| 41256|0.003999418266433973|
|Marijampolės savi...| 162| 57937| 0.00279614063551789|
|Alytaus miesto sa...| 161| 53920|0.002985905044510386|
|Trakų rajono savi...| 160| 35864|0.004461298237787196|
+--------------------+---------+----------+--------------------+
only showing top 20 rows
municipality_count_pop.select("municipality","acc_by_pop").write.format("csv").option("header", true).save("dbfs:/datasets/lithuania/municipalities_pop.csv")
Step 0: Downloading datasets and load into dbfs
- get the accident data
- get the Lithuanian municipality data
dbutils.fs.cp("dbfs:/FileStore/tables/ltcar_reprojected.csv", "dbfs:/datasets/magellan/LTcar_reprojected.csv")
res5: Boolean = true
display(dbutils.fs.ls("dbfs:/datasets/magellan/"))
| path | name | size |
|---|---|---|
| dbfs:/datasets/magellan/LT_adm/ | LT_adm/ | 0.0 |
| dbfs:/datasets/magellan/LTbhd/ | LTbhd/ | 0.0 |
| dbfs:/datasets/magellan/LTcar_locations.csv | LTcar_locations.csv | 706938.0 |
| dbfs:/datasets/magellan/LTcar_reprojected.csv | LTcar_reprojected.csv | 752891.0 |
| dbfs:/datasets/magellan/SFNbhd/ | SFNbhd/ | 0.0 |
| dbfs:/datasets/magellan/all.tsv | all.tsv | 6.0947802e7 |
Getting Lithuanian Administrative Divisions Data
Second-level Administrative Divisions, Lithuania, 2015
Data from https://github.com/seporaitis/lt-geojson
wget https://raw.githubusercontent.com/seporaitis/lt-geojson/master/geojson/municipalities.geojson
--2022-04-18 15:00:50-- https://raw.githubusercontent.com/seporaitis/lt-geojson/master/geojson/municipalities.geojson
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9590686 (9.1M) [text/plain]
Saving to: ‘municipalities.geojson’
0K .......... .......... .......... .......... .......... 0% 4.35M 2s
50K .......... .......... .......... .......... .......... 1% 4.09M 2s
100K .......... .......... .......... .......... .......... 1% 4.50M 2s
150K .......... .......... .......... .......... .......... 2% 18.0M 2s
200K .......... .......... .......... .......... .......... 2% 34.7M 1s
250K .......... .......... .......... .......... .......... 3% 7.08M 1s
300K .......... .......... .......... .......... .......... 3% 45.2M 1s
350K .......... .......... .......... .......... .......... 4% 31.9M 1s
400K .......... .......... .......... .......... .......... 4% 59.6M 1s
450K .......... .......... .......... .......... .......... 5% 32.0M 1s
500K .......... .......... .......... .......... .......... 5% 10.7M 1s
550K .......... .......... .......... .......... .......... 6% 19.9M 1s
600K .......... .......... .......... .......... .......... 6% 67.7M 1s
650K .......... .......... .......... .......... .......... 7% 80.4M 1s
700K .......... .......... .......... .......... .......... 8% 79.2M 1s
750K .......... .......... .......... .......... .......... 8% 96.1M 1s
800K .......... .......... .......... .......... .......... 9% 238M 1s
850K .......... .......... .......... .......... .......... 9% 81.1M 1s
900K .......... .......... .......... .......... .......... 10% 80.0M 1s
950K .......... .......... .......... .......... .......... 10% 75.5M 1s
1000K .......... .......... .......... .......... .......... 11% 12.4M 1s
1050K .......... .......... .......... .......... .......... 11% 87.6M 0s
1100K .......... .......... .......... .......... .......... 12% 118M 0s
1150K .......... .......... .......... .......... .......... 12% 27.7M 0s
1200K .......... .......... .......... .......... .......... 13% 164M 0s
1250K .......... .......... .......... .......... .......... 13% 239M 0s
1300K .......... .......... .......... .......... .......... 14% 95.0M 0s
1350K .......... .......... .......... .......... .......... 14% 91.8M 0s
1400K .......... .......... .......... .......... .......... 15% 114M 0s
1450K .......... .......... .......... .......... .......... 16% 160M 0s
1500K .......... .......... .......... .......... .......... 16% 203M 0s
1550K .......... .......... .......... .......... .......... 17% 167M 0s
1600K .......... .......... .......... .......... .......... 17% 74.5M 0s
1650K .......... .......... .......... .......... .......... 18% 73.9M 0s
1700K .......... .......... .......... .......... .......... 18% 60.7M 0s
1750K .......... .......... .......... .......... .......... 19% 59.0M 0s
1800K .......... .......... .......... .......... .......... 19% 142M 0s
1850K .......... .......... .......... .......... .......... 20% 235M 0s
1900K .......... .......... .......... .......... .......... 20% 141M 0s
1950K .......... .......... .......... .......... .......... 21% 138M 0s
2000K .......... .......... .......... .......... .......... 21% 214M 0s
2050K .......... .......... .......... .......... .......... 22% 31.2M 0s
2100K .......... .......... .......... .......... .......... 22% 85.9M 0s
2150K .......... .......... .......... .......... .......... 23% 91.3M 0s
2200K .......... .......... .......... .......... .......... 24% 96.5M 0s
2250K .......... .......... .......... .......... .......... 24% 88.7M 0s
2300K .......... .......... .......... .......... .......... 25% 58.4M 0s
2350K .......... .......... .......... .......... .......... 25% 59.8M 0s
2400K .......... .......... .......... .......... .......... 26% 137M 0s
2450K .......... .......... .......... .......... .......... 26% 123M 0s
2500K .......... .......... .......... .......... .......... 27% 117M 0s
2550K .......... .......... .......... .......... .......... 27% 107M 0s
2600K .......... .......... .......... .......... .......... 28% 135M 0s
2650K .......... .......... .......... .......... .......... 28% 47.2M 0s
2700K .......... .......... .......... .......... .......... 29% 83.0M 0s
2750K .......... .......... .......... .......... .......... 29% 102M 0s
2800K .......... .......... .......... .......... .......... 30% 151M 0s
2850K .......... .......... .......... .......... .......... 30% 135M 0s
2900K .......... .......... .......... .......... .......... 31% 68.8M 0s
2950K .......... .......... .......... .......... .......... 32% 59.0M 0s
3000K .......... .......... .......... .......... .......... 32% 4.71M 0s
3050K .......... .......... .......... .......... .......... 33% 92.0M 0s
3100K .......... .......... .......... .......... .......... 33% 96.4M 0s
3150K .......... .......... .......... .......... .......... 34% 69.4M 0s
3200K .......... .......... .......... .......... .......... 34% 81.6M 0s
3250K .......... .......... .......... .......... .......... 35% 70.3M 0s
3300K .......... .......... .......... .......... .......... 35% 43.2M 0s
3350K .......... .......... .......... .......... .......... 36% 29.4M 0s
3400K .......... .......... .......... .......... .......... 36% 40.3M 0s
3450K .......... .......... .......... .......... .......... 37% 33.6M 0s
3500K .......... .......... .......... .......... .......... 37% 34.9M 0s
3550K .......... .......... .......... .......... .......... 38% 30.7M 0s
3600K .......... .......... .......... .......... .......... 38% 32.4M 0s
3650K .......... .......... .......... .......... .......... 39% 35.3M 0s
3700K .......... .......... .......... .......... .......... 40% 33.2M 0s
3750K .......... .......... .......... .......... .......... 40% 38.5M 0s
3800K .......... .......... .......... .......... .......... 41% 78.5M 0s
3850K .......... .......... .......... .......... .......... 41% 81.5M 0s
3900K .......... .......... .......... .......... .......... 42% 95.4M 0s
3950K .......... .......... .......... .......... .......... 42% 91.4M 0s
4000K .......... .......... .......... .......... .......... 43% 133M 0s
4050K .......... .......... .......... .......... .......... 43% 147M 0s
4100K .......... .......... .......... .......... .......... 44% 116M 0s
4150K .......... .......... .......... .......... .......... 44% 113M 0s
4200K .......... .......... .......... .......... .......... 45% 139M 0s
4250K .......... .......... .......... .......... .......... 45% 138M 0s
4300K .......... .......... .......... .......... .......... 46% 139M 0s
4350K .......... .......... .......... .......... .......... 46% 122M 0s
4400K .......... .......... .......... .......... .......... 47% 144M 0s
4450K .......... .......... .......... .......... .......... 48% 131M 0s
4500K .......... .......... .......... .......... .......... 48% 148M 0s
4550K .......... .......... .......... .......... .......... 49% 44.5M 0s
4600K .......... .......... .......... .......... .......... 49% 83.8M 0s
4650K .......... .......... .......... .......... .......... 50% 132M 0s
4700K .......... .......... .......... .......... .......... 50% 144M 0s
4750K .......... .......... .......... .......... .......... 51% 86.2M 0s
4800K .......... .......... .......... .......... .......... 51% 72.0M 0s
4850K .......... .......... .......... .......... .......... 52% 78.3M 0s
4900K .......... .......... .......... .......... .......... 52% 99.6M 0s
4950K .......... .......... .......... .......... .......... 53% 70.1M 0s
5000K .......... .......... .......... .......... .......... 53% 83.3M 0s
5050K .......... .......... .......... .......... .......... 54% 78.1M 0s
5100K .......... .......... .......... .......... .......... 54% 98.5M 0s
5150K .......... .......... .......... .......... .......... 55% 91.9M 0s
5200K .......... .......... .......... .......... .......... 56% 142M 0s
5250K .......... .......... .......... .......... .......... 56% 98.5M 0s
5300K .......... .......... .......... .......... .......... 57% 86.3M 0s
5350K .......... .......... .......... .......... .......... 57% 78.9M 0s
5400K .......... .......... .......... .......... .......... 58% 138M 0s
5450K .......... .......... .......... .......... .......... 58% 144M 0s
5500K .......... .......... .......... .......... .......... 59% 92.9M 0s
5550K .......... .......... .......... .......... .......... 59% 122M 0s
5600K .......... .......... .......... .......... .......... 60% 138M 0s
5650K .......... .......... .......... .......... .......... 60% 154M 0s
5700K .......... .......... .......... .......... .......... 61% 6.75M 0s
5750K .......... .......... .......... .......... .......... 61% 52.2M 0s
5800K .......... .......... .......... .......... .......... 62% 106M 0s
5850K .......... .......... .......... .......... .......... 62% 60.6M 0s
5900K .......... .......... .......... .......... .......... 63% 127M 0s
5950K .......... .......... .......... .......... .......... 64% 124M 0s
6000K .......... .......... .......... .......... .......... 64% 140M 0s
6050K .......... .......... .......... .......... .......... 65% 149M 0s
6100K .......... .......... .......... .......... .......... 65% 145M 0s
6150K .......... .......... .......... .......... .......... 66% 140M 0s
6200K .......... .......... .......... .......... .......... 66% 176M 0s
6250K .......... .......... .......... .......... .......... 67% 150M 0s
6300K .......... .......... .......... .......... .......... 67% 62.8M 0s
6350K .......... .......... .......... .......... .......... 68% 58.0M 0s
6400K .......... .......... .......... .......... .......... 68% 39.4M 0s
6450K .......... .......... .......... .......... .......... 69% 37.8M 0s
6500K .......... .......... .......... .......... .......... 69% 36.9M 0s
6550K .......... .......... .......... .......... .......... 70% 33.7M 0s
6600K .......... .......... .......... .......... .......... 71% 39.3M 0s
6650K .......... .......... .......... .......... .......... 71% 37.6M 0s
6700K .......... .......... .......... .......... .......... 72% 38.0M 0s
6750K .......... .......... .......... .......... .......... 72% 34.2M 0s
6800K .......... .......... .......... .......... .......... 73% 36.5M 0s
6850K .......... .......... .......... .......... .......... 73% 36.0M 0s
6900K .......... .......... .......... .......... .......... 74% 36.5M 0s
6950K .......... .......... .......... .......... .......... 74% 34.3M 0s
7000K .......... .......... .......... .......... .......... 75% 41.6M 0s
7050K .......... .......... .......... .......... .......... 75% 37.5M 0s
7100K .......... .......... .......... .......... .......... 76% 36.0M 0s
7150K .......... .......... .......... .......... .......... 76% 26.1M 0s
7200K .......... .......... .......... .......... .......... 77% 38.3M 0s
7250K .......... .......... .......... .......... .......... 77% 37.2M 0s
7300K .......... .......... .......... .......... .......... 78% 36.5M 0s
7350K .......... .......... .......... .......... .......... 79% 13.5M 0s
7400K .......... .......... .......... .......... .......... 79% 34.9M 0s
7450K .......... .......... .......... .......... .......... 80% 38.6M 0s
7500K .......... .......... .......... .......... .......... 80% 39.5M 0s
7550K .......... .......... .......... .......... .......... 81% 37.1M 0s
7600K .......... .......... .......... .......... .......... 81% 40.0M 0s
7650K .......... .......... .......... .......... .......... 82% 36.9M 0s
7700K .......... .......... .......... .......... .......... 82% 42.2M 0s
7750K .......... .......... .......... .......... .......... 83% 38.9M 0s
7800K .......... .......... .......... .......... .......... 83% 45.4M 0s
7850K .......... .......... .......... .......... .......... 84% 46.0M 0s
7900K .......... .......... .......... .......... .......... 84% 40.2M 0s
7950K .......... .......... .......... .......... .......... 85% 39.9M 0s
8000K .......... .......... .......... .......... .......... 85% 45.1M 0s
8050K .......... .......... .......... .......... .......... 86% 45.3M 0s
8100K .......... .......... .......... .......... .......... 87% 44.6M 0s
8150K .......... .......... .......... .......... .......... 87% 79.8M 0s
8200K .......... .......... .......... .......... .......... 88% 120M 0s
8250K .......... .......... .......... .......... .......... 88% 118M 0s
8300K .......... .......... .......... .......... .......... 89% 118M 0s
8350K .......... .......... .......... .......... .......... 89% 103M 0s
8400K .......... .......... .......... .......... .......... 90% 118M 0s
8450K .......... .......... .......... .......... .......... 90% 96.8M 0s
8500K .......... .......... .......... .......... .......... 91% 102M 0s
8550K .......... .......... .......... .......... .......... 91% 87.7M 0s
8600K .......... .......... .......... .......... .......... 92% 107M 0s
8650K .......... .......... .......... .......... .......... 92% 123M 0s
8700K .......... .......... .......... .......... .......... 93% 116M 0s
8750K .......... .......... .......... .......... .......... 93% 111M 0s
8800K .......... .......... .......... .......... .......... 94% 116M 0s
8850K .......... .......... .......... .......... .......... 95% 190M 0s
8900K .......... .......... .......... .......... .......... 95% 87.6M 0s
8950K .......... .......... .......... .......... .......... 96% 100M 0s
9000K .......... .......... .......... .......... .......... 96% 116M 0s
9050K .......... .......... .......... .......... .......... 97% 123M 0s
9100K .......... .......... .......... .......... .......... 97% 126M 0s
9150K .......... .......... .......... .......... .......... 98% 210M 0s
9200K .......... .......... .......... .......... .......... 98% 237M 0s
9250K .......... .......... .......... .......... .......... 99% 242M 0s
9300K .......... .......... .......... .......... .......... 99% 240M 0s
9350K .......... ..... 100% 146M=0.2s
2022-04-18 15:00:51 (44.9 MB/s) - ‘municipalities.geojson’ saved [9590686/9590686]
# Reading and processing geojson. Removing @relations (fails for some reason and not needed)
import json
# municipalities / Savivaldybės
municipalities = json.load(open("municipalities.geojson", 'r'))
list_to_remove = []
i = 0
for feature in municipalities['features']:
municipalities['features'][i]["properties"].pop("relations", None)
municipalities['features'][i]["properties"].pop("@relations", None)
i+=1
for feature in municipalities['features']:
for property in feature["properties"]:
print(property)
with open("municipalities.geojson", 'w') as outfile:
json.dump(municipalities, outfile)
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:bat-smg
name:de
name:en
name:fi
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
is_in:country_code
name
name:be
name:be-tarask
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:lt
name:pl
name:ru
type
website
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:bat-smg
name:ca
name:de
name:en
name:fi
name:fr
name:it
name:lt
name:lv
name:nn
name:pl
name:ru
name:sco
name:zh
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:en
name:lt
name:lv
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:bat-smg
name:ca
name:de
name:en
name:fr
name:it
name:lt
name:lv
name:pl
name:ru
name:zh
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:bat-smg
name:de
name:el
name:en
name:fi
name:fr
name:it
name:lt
name:lv
name:nl
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:ca
name:de
name:en
name:eo
name:es
name:fi
name:fr
name:it
name:ka
name:lt
name:lv
name:pl
name:ru
name:zh
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
is_in:country_code
name
name:de
name:fi
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:en
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:en
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
is_in:country_code
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:et
name:lt
name:pl
name:ru
type
website
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:bat-smg
name:de
name:en
name:fi
name:fr
name:it
name:ka
name:lt
name:lv
name:nl
name:no
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
source
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:en
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:en
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:bat-smg
name:ca
name:de
name:en
name:es
name:fi
name:fr
name:it
name:ka
name:lt
name:lv
name:pl
name:ru
name:zh
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:lv
name:pl
name:ru
name:ur
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:bat-smg
name:ca
name:de
name:es
name:et
name:fi
name:fr
name:he
name:it
name:ka
name:lmo
name:lt
name:lv
name:pl
name:ru
name:ur
name:zh
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:lt
name:pl
name:ru
type
wikidata
wikipedia
@id
ISO3166-2
admin_level
boundary
name
name:de
name:el
name:en
name:et
name:fi
name:fr
name:it
name:lt
name:lv
name:nl
name:no
name:pl
name:ru
type
wikidata
wikipedia
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
@id
dbutils.fs.cp("file:/databricks/driver/municipalities.geojson", "dbfs:/datasets/magellan/")
res36: Boolean = true
Visualization of the Segmentation by municipalities using Python.
Virginia Jimenez Mohedano (LinkedIn) and Raazesh Sainudiin (LinkedIn).
This project was supported by UAB SENSMETRY through a Data Science Thesis Internship
between 2022-01-17 and 2022-06-05 to Virginia J.M. and
Databricks University Alliance with infrastructure credits from AWS to
Raazesh Sainudiin, Department of Mathematics, Uppsala University, Sweden.
2022, Uppsala, Sweden
# Reading accident frequencies for each municipality previously obtained
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StringType, DoubleType
schema = StructType() \
.add("municipality", StringType(), True) \
.add("frequency", DoubleType(), True)
municipality_freq = spark.read.format("csv").option("header", True).schema(schema).load("dbfs:/datasets/lithuania/municipalities_freq.csv")
municipality_freq.show(1000)
+--------------------+--------------------+
| municipality| frequency|
+--------------------+--------------------+
|Kaišiadorių rajon...|0.010304096506659964|
|Kelmės rajono sav...|0.010304096506659964|
|Pakruojo rajono s...|0.003853564547206...|
|Skuodo rajono sav...|0.003853564547206...|
|Elektrėnų savival...|0.004356203401189578|
|Kazlų Rūdos saviv...|0.004356203401189578|
|Neringos savivaldybė|0.001507916561950...|
|Birštono savivaldybė|0.001507916561950...|
|Šalčininkų rajono...|0.009047499371701432|
|Švenčionių rajono...|0.005277707966825836|
|Radviliškio rajon...|0.011979559353271342|
|Vilkaviškio rajon...|0.008628633660048589|
|Širvintų rajono s...|0.003518471977883...|
|Klaipėdos miesto ...|0.061405713328306945|
|Panevėžio miesto ...| 0.04959370025969674|
|Panevėžio rajono ...| 0.02345647985255927|
|Kėdainių rajono s...|0.013990114769204993|
|Mažeikių rajono s...|0.013236156488229874|
|Anykščių rajono s...| 0.00636675881712323|
|Šiaulių miesto sa...|0.039205830610706205|
|Klaipėdos rajono ...| 0.02421043813353439|
|Šilutės rajono sa...|0.016419535896791487|
|Raseinių rajono s...|0.014660299907849544|
|Tauragės rajono s...|0.013822568484543855|
|Rokiškio rajono s...|0.008293541090726313|
|Ukmergės rajono s...|0.008042221663734606|
|Šilalės rajono sa...|0.006199212532462093|
|Molėtų rajono sav...|0.005612800536148...|
|Joniškio rajono s...|0.005445254251486974|
|Kupiškio rajono s...|0.005193934824495267|
|Akmenės rajono sa...|0.004272430258859...|
|Ignalinos rajono ...|0.003602245120214459|
|Šiaulių rajono sa...| 0.01792745245874173|
|Plungės rajono sa...|0.015498031331155232|
|Telšių rajono sav...|0.014241434196196699|
|Kretingos rajono ...| 0.01130937421462679|
|Pasvalio rajono s...|0.010806735360643378|
|Palangos miesto s...|0.010387869648990534|
|Varėnos rajono sa...|0.007874675379073468|
|Lazdijų rajono sa...|0.004775069112842423|
|Jurbarko rajono s...|0.003937337689536734|
|Zarasų rajono sav...|0.002680740554578...|
|Vilniaus miesto s...| 0.19736952333082014|
|Vilniaus rajono s...| 0.03602245120214459|
|Jonavos rajono sa...|0.014073887911535563|
|Prienų rajono sav...|0.009382591941023708|
|Šakių rajono savi...|0.008963726229370864|
|Biržų rajono savi...|0.006701851386445506|
|Alytaus miesto sa...| 0.01348747591522158|
|Trakų rajono savi...|0.013403702772891012|
|Alytaus rajono sa...| 0.01206333249560191|
|Utenos rajono sav...|0.010555415933651672|
|Druskininkų saviv...|0.002596967412247...|
|Marijampolės savi...| 0.01357124905755215|
|Kauno miesto savi...| 0.12239256094496105|
|Kauno rajono savi...| 0.02906928038870738|
|Kalvarijos saviva...|0.002261874842925358|
| Pagėgių savivaldybė|0.001424143419619...|
|Visagino savivaldybė|0.002513194269917...|
| Rietavo savivaldybė|0.003183379408561615|
+--------------------+--------------------+
# Calculating colors
# https://matplotlib.org/stable/tutorials/colors/colormaps.html
from matplotlib.cm import viridis
from matplotlib.colors import to_hex
min_freq = municipality_freq.agg({"frequency":"min"}).collect()[0][0]
max_freq = municipality_freq.agg({"frequency":"max"}).collect()[0][0]
freq_range = max_freq - min_freq
def calculate_color(row):
freq = row["frequency"]
"""
Convert the freq to a color
"""
# make freq a number between 0 and 1
normalized_freq = (freq - min_freq) / freq_range
# This is because in viridis colormap, darker is lower values and we want the opposite
inverse_freq = 1-normalized_freq
# transform the freq coefficient to a matplotlib color
mpl_color = viridis(inverse_freq)
# transform from a matplotlib color to a valid CSS color
gmaps_color = to_hex(mpl_color, keep_alpha=False)
return (row["municipality"],gmaps_color)
# Calculate a color for each district
colors = municipality_freq.rdd.map(lambda row: calculate_color(row)).collectAsMap()
//Temporary copy of geojson so python can read it
dbutils.fs.cp("dbfs:/datasets/magellan/municipalities.geojson", "file:/databricks/driver/")
res0: Boolean = true
# Reading and processing geojson (map and borders)
import json
import gmaps
import gmaps.datasets
import gmaps.geojson_geometries
from ipywidgets.embed import embed_minimal_html
gmaps.configure(api_key="AIzaSyDEHHgMMS33M5AT8lav2Q-sem5KOyFx9Sc") # Your Google API key
# municipalities / Savivaldybės
municipalities = json.load(open('municipalities.geojson', 'r'))
# Removing municipality capitals
list_to_remove = []
i = 0
for feature in municipalities['features']:
if feature["geometry"]["type"] != "Polygon":
list_to_remove.append(i)
i+=1
# Removing what was found before
for index in sorted(list_to_remove, reverse=True):
del municipalities['features'][index]
# Order the colors by the geojson order
ordered_colors = []
for feature in municipalities['features']:
municipality = feature['properties']['name']
color = colors[municipality]
ordered_colors.append(color)
from pylab import *
# Generating map
fig = gmaps.figure()
freq_layer = gmaps.geojson_layer(
municipalities,
fill_color=ordered_colors,
fill_opacity=0.8,
stroke_color='black',
stroke_opacity=1.0,
stroke_weight=0.2)
fig.add_layer(freq_layer)
embed_minimal_html("export.html", views=[fig])
# Adding color legend to map
cmap = cm.get_cmap('viridis', 20)
gradient = ""
for i in reversed(range(cmap.N)):
rgba = cmap(i)
# rgb2hex accepts rgb or rgba
gradient = gradient + "," + matplotlib.colors.rgb2hex(rgba)
# Removing first comma
gradient = gradient[1:]
html_file_content = open("export.html", 'r').read()\
.replace("</head>", """<style>
.legend {
max-width: 430px;
}
.legend div{
background: linear-gradient(to right, """ + gradient + """);
border-radius: 4px;
padding: 10px;
}
.legend p {
text-align: justify;
text-justify: inter-word;
margin: 0px;
margin-block-start: 0em;
margin-block-end: 0em;
height: 1em;
}
.legend p:after {
content: "";
display: inline-block;
width: 100%;
}
</style>
</head>""")\
.replace("</body>","""
<h2>Relative frequency of accidents</h2>
<div class="legend">
<p>""" + str(round(min_freq,2)) + " " + str(round(max_freq,2)) +"""</p>
<div></div>
</div>
</body>""")
# !!!!!!!!!!!!!!!!!!!!!
# Can only be run once per cluster restart
displayHTML(html_file_content)


